Title: Introduction Business intelligence and data warehousing
1 IntroductionBusiness intelligence and data
warehousing
Dr. Reuven Bakalash
2Business Intelligence
- The purpose of business intelligence is to
convert the volume of data into value for the end
users.
Value
Decision
Knowledge
Information
Data
Volume
3Categories of Business Intelligence Tools
- Reporting tools
- Query tools (data access)
- On-line analytical processing (OLAP) tools
- Data mining tools
- Analytical applications
4Operational and Informational Data Systems
Warehouse (informational)
Operational
Load
Read
Insert Read Update Delete
5Operational vs. Informational Data Stores
- Operational data is created by a variety of
on-line transaction systems (OLTP) (e.g.
financial, order entry, work scheduling, and
point-of-sales systems). It focuses on
transactional functions (such as bank withdrawal
and deposits). This is a detailed, nonredundant,
and updateable data and it reflects current
values. Questions as How many gadgets were sold
to a customer nbr. 123456 on September 11? - Informational data focuses on providing answers
to problems posed by decision makers. It is
summarized, is redundant to support varying data
views, and is nonupdateable. A series of
snapshots of instances over time. Questions as
What three products resulted in the most
frequent calls to the hotline over the past
quarter.
6Operational systems
- Organized by application
- Support daily business processing on a detailed
transactional level - Are update-intensive
- Use current data
- Are optimized for high performance
- Access few record per transaction, often direct
access by primary key - Support a large number of relatively short
transactions - Support a large number of concurrent users
7Operational vs. Informational
- Ad-hoc access
- Multidimensional data model
- Recent, aggregated, derived, and historical data
- Periodic, scheduled batch updates
- Queried data
- Millions
- Hundreds
- Relatively low
- Analytical, managerial
- Many complex, compound
- Second to minutes
- Ad-hoc, unstructured, heuristic
- Predifined , structured access
- Relational 2D tables
- Current data or a snapshot of recent data
- Continuous high-frequency changes
- Subject to concurrent update, insert, and delete
- Tens
- Thousands
- High
- Clerical, operational
- Few, simple
- Subseconds
- Predictable, repetitive
Data access
Data model
Time base
Data changes
Unit of work
Record range accessed per transaction
Concurrent users
Transaction volume
Users
Nbr of indexes
Response
Usage
8The Enterprise Decision-Making Loop
Transactional Systems
Business Intelligence Software
Analytical Applications
9Data Warehouse
An enterprise structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support. The data warehouse stores atomic and
summary data.
10Analysis by Data Warehouse
- Retrospective analysis
- Focuses on the issues of past and present events.
For example, in-depth analysis of the performance
of the sales organization for the last 2 years
across different geographic regions,
demographics, and types of products - Predictive analysis
- Focuses on predicting certain events or behavior
based on historical information. For example, to
build a predictive model which describes the
attrition rates of their customers to the
competition and also defines steps to reduce the
attrition with certain degree of confidence
11Application types and techniques for analysis
- Classification
- Classifying database records into a number of
predefined classes based on certain criteria.
e.g. a credit card company may classify customer
records as good, medium, or poor risk. A
classification system may generate a rule stating
that A customer who earns more than 40,000, is
between 45 and 55 years of age, and lives within
a particular zip code is a good credit risk. - Clustering and segmentation
- Segmenting a database into subsets, or clusters,
based on a set of attributes. e.g. in the process
of understanding its customer base, an
organization may segment the known population and
thus discover clusters of potential customers on
the basis of attributes (type of school they
attended, number of vacations per year). Clusters
can be analyzed by a program, or by using
visualization techniques. - Associations
- Identifying similarities (affinities) among the
collection as reflected in the examined records.
These affinities are often expressed as rules,
e.g., 60 of all the records that contain items
A and B also contain C and D. The percentage of
occurrences is the confidence factor of the
association. - Sequencing
- Identifying patterns over time. e.g. an analysis
of customers purchases during separate visits.
It could be found, for instance, that a customer
who buys engine-oil and filter during one visit
will buy gasoline additive next time.
12Typical Uses of a Data Warehouse
- Airline
- Banking
- Health Care
- Investment
- Insurance
- Retail
- Telecommunications
- Manufacturing
- Credit card suppliers
- Clothing distributors
13Warehouse Components
Data
Access
Source
Data Mart
Data Analysis and reporting
Operational data
Data Extract Data Cleanup Data Load
OLAP
Data Mining
External data
14(No Transcript)
15Data Mart
- Data Mart contains a subset or summary of data in
the warehouse that is tailored to the specific
needs of a department or business area.
16Data Mining
- Data mining is the exploration and analysis of
large quantities of data in order to discover
meaningful patterns, trends, relationships, and
rules. - Data mining is also known as
- Knowledge discovery
- Data surfing
- Data harvesting
17Data Mining Tools
- Identify patterns and relationships in data that
are often useful for building models that aid
decision making or predict behavior - Data mining uses technologies such as neural
networks, rule induction, and clustering to
discover relationships in data and make
predictions that are hidden, not apparent, or to
complex to be extracted using statistical
techniques.
18Uses of Data Mining
- Customer profiling
- Market segmentation
- Buying pattern affinities
- Database marketing
- Credit scoring and risk analysis
19On-line analytical processing (OLAP)
- OLAP refers to the interactive analysis of
multidimensional information. It is used
primarily to support decision making within an
enterprise. - OLAP involves three basic activities
- Applying mathematical operators to various
measures (e.g. revenue and expense) to create
other measures (e.g. profit). - Reorganizing (pivoting) measures along a variety
of dimensions (e.g. pivoting data to view first
expenses by year, then expenses by department). - Aggregating measures (drill up) and decomposing
aggregates (drill down) to different hierarchical
levels within a dimension (e.g. drilling down
from expenses by year to expenses by quarter).
20OLAP
Prod
Regional mgr. view
Product mgr. view
Geography
Sales
Time
Financial mgr. view
Ad hoc view
21Multidimensional Query Techniques
Why?
- SlicingDicingDrilling down
- Drilling up
- Drilling across
- Pivoting
What?
Why?
Why?
22Comparing OLAP and Data Mining Queries
- OLAP queries
- Based on prior knowledge and assumptions
- User-driven
- Data mining queries
- Require domain-specific knowledge to interpret
data - User-guided
23Data Mining - visualization of association rules
24Examples of OLAP analytical questions
- Give me the sales figures of last month segmented
by country and by salesman - Give me the average monthly sales figures of the
last year segmented by salesman and product group
25Analytical Tasks
Question
Task
Tool
Production reporting
- Reports What were sales by region last
quarter?
Ad hoc query and analysis
Query What is driving the tools increase
in North American sales?
Advanced analysis
Given the rapid increase OLAP in Web
sales, what will total sales be for the rest
of the year?
26Dimensionality of data
- Data warehouse repository stores data organized
as a multidimensional model. The independent
variables (e.g. product, customer, time period)
form the dimensions of the array by which the
data (e.g. quantity) is organized. The
independent variables are often related in
hierarchies. - Dimensional modeling is the only viable technique
for delivering data to end users in a data
warehouse. The transaction-oriented database of
operational systems can not be queried.
27Dimensionality of data
- The system must provide a multidimensional
conceptual view of the data, including full
support for hierarchies and multiple hierarchies,
as this is certainly the most logical way to
analyze businesses and organizations.