Title: A Quick Guide to Datawarehouse
1 Welcome To Loginworks Softwares
2A Quick Guide to Datawarehouse
What is Data Warehouse? A Data Warehouse is a
collective bunch of data of corporate world and
these retrieve data from the external source and
other operational systems. It handles the
enterprises daily transaction operations which
helps in making the business decision due to
reporting and analysis tools at different levels.
Furthermore, data is inserted into the warehouse
database through ETL process (data extraction,
data transformation, and data loading ) by using
OLTP applications like external providers and
mainframe applications etc.
3 Data Warehouse
The whole concept of DW was established in the
1980s by IBM employees Paul Murphy and Barry
Devlin. They formally enhanced Business Data
Warehouse. Therefore, this concept is specially
formed to provide the data flow from external
operational programs to make the better decision
for organizations perspective. Basically, Data
warehouse provides consolidated data,
transactional data, and historical data from
different sources. Data warehouse segregates the
workload analysis and enhances the organization
data from other external links.
4Why the need of Data Warehouse?
A data warehouse is a domain system of OLTP
(Online transaction processing). So, data can be
stored in the data warehouse through data
warehouse appliance for months and years to make
the historical analysis. A warehouse appliance is
a mixture of software and hardware tools.
Therefore, these tools store the data in the
database of a data warehouse. Basically, the
storage capacity of data warehouse comes around
in petabyte and terabyte ranges.
So, lets understand the practical meaning of
data warehousing with an example. So, lets
assume a big organization which stores data
(information) related to its employees, their
date of joining, date of leaving, their salaries,
product estimation, client and customer
information, invoices and sales etc. Furthermore,
the head of an organization may want any
information related to the measurement of
cost-reduction. As a result, the answer will rely
on historical analysis of the data. This is the
main advantage of data warehousing.
Business Analysis Framework This framework
tells the actual working of a data warehouse. A
(Business Analyst) BA retrieves the valuable
information from the DW (Data Warehouse).
Further, it identifies the performance and makes
crucial decision to win with other market
stakeholders. For this reason, a data warehouse
increases the organizations productivity. It
also provides the consistent workflow to make the
relationship strong with customers.
5 Properties Data warehouse
Integrated It consolidates the data from
external sources like relational DBMS, extended
DBMS, and Flat files etc. It also provides the
effective data analysis in the data
warehouse. Subject Oriented This is the main
feature of the data warehouse. Data warehousing
is the subject-oriented system because that
provides the data of a subject instead of
companys ongoing operations. These subject can
be the product, branch information, sales, and
revenue etc. A simple data warehousing does not
require any ongoing operations. It also provides
data analysis and modeling for decision making.
6Non-Volatile In this situation, old data is not
deleted while the new record is added to the
database. Old data is merged with new data by
using SQL JOINS query. A data warehouse is apart
from the external operational sources that do not
reflect the in the database of the data
warehouse. Time-Variant Extracted or collected d
ata is analyzed for a particular period of time.
Here, data provides in historical form.
7 ETL Process in Data Warehouse
ETL is a combination structure of extraction of
data, the transformation of data and loading of
data. These ETL process help in preparation of
data in the database management. It also extracts
data from external sources, then transforms the
data by using concatenation, data aggregation,
and calculations rules. Later, it loads the
information in the data warehouse. As a result,
it simply analyzes the business data to make the
crucial business decision. So, let us understand
the ETL process flow in brief as follows.
8Extraction Extract process reads the data from un
known external database and extracts in the
desired form to make the data easily accessible
for business objectives. On another hand, the
major point to extract data is to fetch the
relevant information from the system with minimum
resources. As a result, this step does not affect
the data. Transformation This function work like
filtration and data cleaning. This step is
applied to some important lookup rules and
formulas on extracted data as per the business
requirements. Then, it converts the data into the
desired table or a database. Therefore, this
phase of ETL contains verification, validation,
and rejection of data. First of all, it analyzes
the extracted data through verification that
defines the extracted data is in the correct
format or not. Secondly, It validates the extract
ed data means make sure that data is extracted
correctly what we required for. Here, you can use
SQL during the validation process it will help
you to find the correct data. You can even find
the duplicate data, sorting of data, data
conversion, and consistency of the data in the
database. After validation process, the query res
ult will give you a final data. Now, apply set of
rules to get the duplicate and unwanted data. As
a result, it automatically will reject the data.
This is how transformation process works.
9 Loading This is the final stage of ETL process.
This phase defines the resultant of data
extraction and transformation. After two steps of
EPL data then loads into the database of the data
warehouse. You can also take help of SELECT
command while data loading into the database. As
a result, once you load the data then refresh the
database.
10Data Warehouse Architecture Model (Three-Tier
Architecture)
A data warehouse is an RDBMS (Relational Database
management System), that is well developed for
query analysis. A data warehouse maintains huge
data in the database to make the changes in
historical analysis. In this session, I will
discuss the three-tier architecture model of the
data warehouse. In conclusion, this three-tier
architecture contains three layers as follows.
11Bottom-Tier Layer This is the bottom tier layer c
ontain all of the database servers in the data
warehouse which relates to the relational
database system. Data is extracted from external
and operational databases due to some backends
tool. Then, feed data into the database of the
warehouse by performing ETL process (Extract,
Transform, and Loading). Basically, we use ETL
process to load data into the bottom layer of the
data warehouse architecture model. In addition,
the bottom layer contains several entities like
as Data warehouse Data Marts Metadata Repositor
y Monitoring Administration 1. Data Warehouse
A data warehouse is a sophisticated form of
operational database that contains only required
information and allow to get fast access to the
information.
122. Data Marts Data Mart is an optimized version o
f the data warehouse which relies on a particular
subjected area. A data mart maintains access
layer to collect the relevant information from
the organizational data store. Therefore, data
marts only focus on specific business
architecture to build the warehouse in a large
size that takes less time and generate
organization revenue. Basically, it divides the
data into small portion like sales information,
customers information, and items information etc.
In addition, some key points need to remember of
data mart as follows. Data Marts are Windows, Uni
x and Linux based servers that create data marts.
It also contains low cost to create data marts.
Implementation of data marts cycles cannot be
stored for a long time means it only valid for
weeks not for months and years.
Data marts are much smaller in size but more
flexible. Data marts can customize the data by
their departments. Because of the user understand
ing, data mart requires less implementation as
compared to the data warehouse. As a result, data
mart can increase business process.
13 Types Data Mart
- Data marts contain its three type as follows
- Dependent data mart directly sourced from the
database of a data warehouse. - Independent data mart sourced with more than one
data warehouse. - Hybrid data mart converts the data from data
warehouse to operational database. - So, lets understand the data marts types in
brief. - Dependent Data Mart
- A dependent data mart access data source directly
from the single data warehouse database. This
type of data mart depends on only enterprise data
warehouse or centralized data warehouse. You can
access only one database at a time. But, if you
want more than one then you have to individually
configure them is called dependent data mart.
14 Independent Data Mart
This type of data mart works without the use of
any centralized system or data enterprise
warehouse. Independent data mart divides the data
into small portion within an organization.
Therefore, independent data mart does not contain
any relation value with data enterprise warehouse
and other operational data marts. Hence, it works
independently from the operational system. For
this reason, you need to develop new centralized
store where an end-user can analyze the data
multiple times.
15 Hybrid Data Mart
A hybrid data mart is a mixture of multiple
inputs from the external sources in the data
warehouse. Basically, it works on the ad-hoc
environment. It is designed to work on more than
a database and also provide the fastest
implementation for any big organization. It
requires less cleaning process and supports big
database. Hybrid data mart contains more
flexibility than dependent and independent data
marts for small data applications.
16 Implementation process of Data Mart Data mart
contains five implementation process as follows.
Designing Constructing Populating Accessing M
anaging
17Designing Designing is the 1st phase of implement
ation in the data mart, that covers all actions
between the initial request of data mart to
collective data through the basic data mart
requirements. It also discovers both physical
logical model of the data mart. In addition, this
phase contains some tasks as follows.
Collect the required technical and business
requirement. It verifies the source of data and s
elects the proper subset of data.
It helps in making the physical and logical
architecture of the data mart. Constructing Cons
tructing is the 2nd phase of datamart, it creates
the logical and physical architecture of the data
mart that provides fast access to data. In
addition, this phase contains some tasks as
follows. An RDBMS stores the constructed data whi
ch contains the addition of the data deletion
of the data. By using SQL query, you can perform
several functions like indexes, views, and
tables. Constructed data remains secure by using
SQL security libraries.
18Populating This is the third phase of data mart p
rocess, that covers all the steps which are used
in above two phase of data mart like getting the
data from the external source, cleaning of the
data and modifying of the data. In addition, this
phase contains some tasks as follows.
Select the target sources for getting the
data. Extraction of data from the sources. Data
cleaning and data transformation.
Loading and storing the data (metadata) into the
database. Accessing This is the 4th phase of dat
a mart implementation, this phase makes the data
usable for data analysis, querying the data,
creating the chart, graphs, and reports to
publish them into the external market for
business objectives. Furthermore, an end-user
submits the set of queries and display the final
result of the queries. Managing This is the fift
h last phase of data mart implementation, that
covers all of the tasks which relate to
management like as.
193. Metadata Repository Metadata is another form o
f simple data. A simple data which represent
different data format call as metadata. Metadata
segregate the especially relevant information of
the data. It also provides the detailed
information of the data. So, lets assume an
index of the notebook consumes as a metadata for
the internal content of the notebook. It also
defines the warehouse objects and roadmap in the
data warehouse. Now, understand the metadata
repository, which is an internal part of data
warehouse system. As a result, it contains some
metadata categories as follows.
Business Metadata This type of metadata covers
business information like business goals, owner
information, changing policies information and
business definition etc. Operational Metadata It
contains lineage of data and data currency. Data
currency defines the active mode of data, purged
and archived. Data lineage refers to the data
transformation and migration rules.
Mapping of data from operating system to Data
warehouse This type of metadata contains sources
database, the content of the data sources,
extraction of data, data partition, cleaning of
data, the transformation of data, purging of data
and refreshing of data rules etc.
Summarization Algorithms It contains data
aggregation, dimension algorithms, data
granularity and data summarization etc.
204. Monitoring Administration This element is ba
sically used for data refreshment and disaster
recovery. It set the limitation in size of data
warehouse and provide data in synchronization.
Further, it controls the ranges of query and
numbers format. It helps to increase performance
better of the database. Middle-Tier Layer In the
middle-tier, there is an OLAP server that works
as a single processing combination of ROLAP and
MOLAP database systems. This OLAP server provides
the multidimensional way to get the data model.
For this reason, analysts and managers can
provide consistent and fastest way to get the
information. ROLAP stands for Relational On-Line
Analytical Process that manipulate the stored
data in the relational database. ROLAP model can
easily maintain the large data. MOLAP stands for
Multidimensional OLAP system that maintains
dynamic multidimensional operations and the data.
Data can be stored in the multidimensional cube.
It provides fastest data retrieval due to MOLAP
cubes. The whole structure of middle layer is to
represent the abstracted view of the data
warehouse. As a result, this layer contains
common behavior between database and end-user.
21Components of OLAP Server There are three compone
nts in OLAP system as follows. Relational OLAP M
ultidimensional OLAP Hybrid OLAP Relational OLAP
Basically, ROLAP works on the stored data in
the relational database. ROLAP server defines the
relationship between front-end(client) tools and
relational back-end tools. ROLAP server provides
the way to manage and store the data by using
relational and extended DBMS. Therefore, this
server can easily handle a large number of data.
But, it contains a drawback that it has some
limited SQL functionalities to maintain the
database and aggregate tables. Multidimensional O
LAP Multidimensional OLAP system handles the mult
idimensional data through array-based storage
systems. It also provides an optimal solution for
dice and slice operations. MOLAP perform the
fastest access of the data in comparison with
relational OLAP. It can also handle complex
calculations. But, it contains some drawbacks
like data aggregation cannot perform while
changing the dimension of cubes. MOLAP maintains
a small number of data.
22Hybrid OLAP Hybrid is a complete mixture of MOLAP
ROLAP servers. A hybrid server provides the
fastest accessibility than MOLAP ROLAP servers.
It contains better scalability and computation.
It also maintains a big number of data (More than
a terabyte). Note OLAP contains additional type
as Specialized SQL servers, that allows to
execute advanced functions of SQL query and
provide better processing support than Snowflake
and Star schemas. So, lets look at the differenc
e between OLAP and OLTP. This is the most
important part in data warehouse, which could be
asked in the interviews for data warehousing
profile.
23Top-Tier Layer Top-Tier is a front-end client-sid
e layer in a three-tier model of the data
warehouse. This contains reporting query tools,
data mining data analysis tools. Here,
reporting tools contain report writers
productive reporting tools. Analysis tools are
also used to make charts based on data mining
result. Data mining tools define the especially
relevant information from the hidden patterns.
Now, we have completed the three-architecture
model of the data warehouse. Conclusion I hope y
ou guys enjoyed this tour of the data warehouse.
A simple data warehouse is a hub where the
especially relevant information resides in the
database. There are many databases for storing
the value but I explained the relational and
extended database management system. Because they
both contain maximum data storage than other
databases. This is a complete tour since
definition to its conclusion. In addition, this
article will help to learn what exactly data
warehouse is? And why is this so important?
I also explained the whole concept of the data
warehouse from the scratch which will help to
learn for freshers in their interviews and the
exams. In the data warehouse architecture,
metadata plays a vital role as it species the
values, the point of connection, source, features
and usage of the data warehouse. In addition,
last but not the least the main topic of this
article is the difference between OLAP and OLTP.
So, most of the companies who hire a candidate or
employee for data warehousing profile they must
ask these differences.
24 Thanks For Watchin
g Connect With Source Url - https//b
it.ly/2OvTimI Call Us- 1-43
4-608-0184