Introduction Business intelligence and data warehousing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Introduction Business intelligence and data warehousing

Description:

Operational data is created by a variety of on-line transaction ... who buys engine-oil and filter during one visit will buy gasoline additive next time. ... – PowerPoint PPT presentation

Number of Views:348
Avg rating:3.0/5.0
Slides: 28
Provided by: csBg
Category:

less

Transcript and Presenter's Notes

Title: Introduction Business intelligence and data warehousing


1
IntroductionBusiness intelligence and data
warehousing
Dr. Reuven Bakalash
2
Business Intelligence
  • The purpose of business intelligence is to
    convert the volume of data into value for the end
    users.

Value
Decision
Knowledge
Information
Data
Volume
3
Categories of Business Intelligence Tools
  • Reporting tools
  • Query tools (data access)
  • On-line analytical processing (OLAP) tools
  • Data mining tools
  • Analytical applications

4
Operational and Informational Data Systems
Warehouse (informational)
Operational
Load
Read
Insert Read Update Delete
5
Operational vs. Informational Data Stores
  • Operational data is created by a variety of
    on-line transaction systems (OLTP) (e.g.
    financial, order entry, work scheduling, and
    point-of-sales systems). It focuses on
    transactional functions (such as bank withdrawal
    and deposits). This is a detailed, nonredundant,
    and updateable data and it reflects current
    values. Questions as How many gadgets were sold
    to a customer nbr. 123456 on September 11?
  • Informational data focuses on providing answers
    to problems posed by decision makers. It is
    summarized, is redundant to support varying data
    views, and is nonupdateable. A series of
    snapshots of instances over time. Questions as
    What three products resulted in the most
    frequent calls to the hotline over the past
    quarter.

6
Operational systems
  • Organized by application
  • Support daily business processing on a detailed
    transactional level
  • Are update-intensive
  • Use current data
  • Are optimized for high performance
  • Access few record per transaction, often direct
    access by primary key
  • Support a large number of relatively short
    transactions
  • Support a large number of concurrent users

7
Operational vs. Informational
  • Ad-hoc access
  • Multidimensional data model
  • Recent, aggregated, derived, and historical data
  • Periodic, scheduled batch updates
  • Queried data
  • Millions
  • Hundreds
  • Relatively low
  • Analytical, managerial
  • Many complex, compound
  • Second to minutes
  • Ad-hoc, unstructured, heuristic
  • Predifined , structured access
  • Relational 2D tables
  • Current data or a snapshot of recent data
  • Continuous high-frequency changes
  • Subject to concurrent update, insert, and delete
  • Tens
  • Thousands
  • High
  • Clerical, operational
  • Few, simple
  • Subseconds
  • Predictable, repetitive

Data access
Data model
Time base
Data changes
Unit of work
Record range accessed per transaction
Concurrent users
Transaction volume
Users
Nbr of indexes
Response
Usage
8
The Enterprise Decision-Making Loop
Transactional Systems
Business Intelligence Software
Analytical Applications
9
Data Warehouse
An enterprise structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support. The data warehouse stores atomic and
summary data.
10
Analysis by Data Warehouse
  • Retrospective analysis
  • Focuses on the issues of past and present events.
    For example, in-depth analysis of the performance
    of the sales organization for the last 2 years
    across different geographic regions,
    demographics, and types of products
  • Predictive analysis
  • Focuses on predicting certain events or behavior
    based on historical information. For example, to
    build a predictive model which describes the
    attrition rates of their customers to the
    competition and also defines steps to reduce the
    attrition with certain degree of confidence

11
Application types and techniques for analysis
  • Classification
  • Classifying database records into a number of
    predefined classes based on certain criteria.
    e.g. a credit card company may classify customer
    records as good, medium, or poor risk. A
    classification system may generate a rule stating
    that A customer who earns more than 40,000, is
    between 45 and 55 years of age, and lives within
    a particular zip code is a good credit risk.
  • Clustering and segmentation
  • Segmenting a database into subsets, or clusters,
    based on a set of attributes. e.g. in the process
    of understanding its customer base, an
    organization may segment the known population and
    thus discover clusters of potential customers on
    the basis of attributes (type of school they
    attended, number of vacations per year). Clusters
    can be analyzed by a program, or by using
    visualization techniques.
  • Associations
  • Identifying similarities (affinities) among the
    collection as reflected in the examined records.
    These affinities are often expressed as rules,
    e.g., 60 of all the records that contain items
    A and B also contain C and D. The percentage of
    occurrences is the confidence factor of the
    association.
  • Sequencing
  • Identifying patterns over time. e.g. an analysis
    of customers purchases during separate visits.
    It could be found, for instance, that a customer
    who buys engine-oil and filter during one visit
    will buy gasoline additive next time.

12
Typical Uses of a Data Warehouse
  • Airline
  • Banking
  • Health Care
  • Investment
  • Insurance
  • Retail
  • Telecommunications
  • Manufacturing
  • Credit card suppliers
  • Clothing distributors

13
Warehouse Components
Data
Access
Source
Data Mart
Data Analysis and reporting
Operational data
Data Extract Data Cleanup Data Load
OLAP
Data Mining
External data
14
(No Transcript)
15
Data Mart
  • Data Mart contains a subset or summary of data in
    the warehouse that is tailored to the specific
    needs of a department or business area.

16
Data Mining
  • Data mining is the exploration and analysis of
    large quantities of data in order to discover
    meaningful patterns, trends, relationships, and
    rules.
  • Data mining is also known as
  • Knowledge discovery
  • Data surfing
  • Data harvesting

17
Data Mining Tools
  • Identify patterns and relationships in data that
    are often useful for building models that aid
    decision making or predict behavior
  • Data mining uses technologies such as neural
    networks, rule induction, and clustering to
    discover relationships in data and make
    predictions that are hidden, not apparent, or to
    complex to be extracted using statistical
    techniques.

18
Uses of Data Mining
  • Customer profiling
  • Market segmentation
  • Buying pattern affinities
  • Database marketing
  • Credit scoring and risk analysis

19
On-line analytical processing (OLAP)
  • OLAP refers to the interactive analysis of
    multidimensional information. It is used
    primarily to support decision making within an
    enterprise.
  • OLAP involves three basic activities
  • Applying mathematical operators to various
    measures (e.g. revenue and expense) to create
    other measures (e.g. profit).
  • Reorganizing (pivoting) measures along a variety
    of dimensions (e.g. pivoting data to view first
    expenses by year, then expenses by department).
  • Aggregating measures (drill up) and decomposing
    aggregates (drill down) to different hierarchical
    levels within a dimension (e.g. drilling down
    from expenses by year to expenses by quarter).

20
OLAP
Prod
Regional mgr. view
Product mgr. view
Geography
Sales
Time
Financial mgr. view
Ad hoc view
21
Multidimensional Query Techniques
Why?
  • SlicingDicingDrilling down
  • Drilling up
  • Drilling across
  • Pivoting

What?
Why?
Why?
22
Comparing OLAP and Data Mining Queries
  • OLAP queries
  • Based on prior knowledge and assumptions
  • User-driven
  • Data mining queries
  • Require domain-specific knowledge to interpret
    data
  • User-guided

23
Data Mining - visualization of association rules
24
Examples of OLAP analytical questions
  • Give me the sales figures of last month segmented
    by country and by salesman
  • Give me the average monthly sales figures of the
    last year segmented by salesman and product group

25
Analytical Tasks
Question
Task
Tool
Production reporting
  • Reports What were sales by region last
    quarter?

Ad hoc query and analysis
Query What is driving the tools increase
in North American sales?
Advanced analysis
Given the rapid increase OLAP in Web
sales, what will total sales be for the rest
of the year?
26
Dimensionality of data
  • Data warehouse repository stores data organized
    as a multidimensional model. The independent
    variables (e.g. product, customer, time period)
    form the dimensions of the array by which the
    data (e.g. quantity) is organized. The
    independent variables are often related in
    hierarchies.
  • Dimensional modeling is the only viable technique
    for delivering data to end users in a data
    warehouse. The transaction-oriented database of
    operational systems can not be queried.

27
Dimensionality of data
  • The system must provide a multidimensional
    conceptual view of the data, including full
    support for hierarchies and multiple hierarchies,
    as this is certainly the most logical way to
    analyze businesses and organizations.
Write a Comment
User Comments (0)
About PowerShow.com