DataWarehousing and DataMining - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

DataWarehousing and DataMining

Description:

DataWarehousing and DataMining Prof. Sin-Min Lee DATA WAREHOUSE, OLAP, and DATA MINING Concepts Data warehousing OLAP (On-Line Analytical Processing) Data mining Case ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 36
Provided by: DanielR176
Learn more at: http://www.cs.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: DataWarehousing and DataMining


1
DataWarehousing and DataMining
Lecture 25 CS 157B
  • Prof. Sin-Min Lee

2
DATA WAREHOUSE, OLAP, and DATA MINING
  • Concepts
  • Data warehousing
  • OLAP (On-Line Analytical Processing)
  • Data mining
  • Case Studies
  • WebTarget (USN)
  • TFDW (USMC)

3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
DATA WAREHOUSE
  • DATABASE MANAGEMENT IN THE INTERNET ERA
  • CLIENT/SERVER - BASED
  • ANALYTICAL vs OPERATIONAL (OLAP vs OLTP)
  • MULTI-DIMENSIONAL ANALYSIS
  • DATA WAREHOUSE (ENTERPRISE-WIDE) vs
  • DATA MART (FUNCTIONAL AREA)

17
MULTIDIMENSIONAL NATURE OF DATA WAREHOUSES
  • BORING QUERY How many Sailors/Marines chose not
    to stay in the Navy/Marine Corps this year?
  • USEFUL QUERY What was our retention
    (separation) rate this year by community by
    paygrade by years of service by gender by rating
    and how did it compare to last year and what can
    we expect next year?

18
DW ARCHITECTURE

19
DW 3-TIER ARCHITECTURE

20
1. DATA QUALITY DATA CLEANSING
  • 1 REASON FOR DW PROJECT FAILURE
  • PROBLEMS
  • - Database heterogeneity
  • - Data heterogeneity
  • FUNCTIONALITY OF TOOLS
  • - Removing unwanted data from operational
    databases
  • - Converting to common data names and
    definitions
  • - Calculating summaries and derived data
  • - Establishing defaults for missing data
  • - Accommodating source data definition changes

21
APPROACHES TO DATA CLEANSING
  • AUTOMATIC CODE GENERATION
  • Creates code to convert from source to target
    data
  • DATA REPLICATION TOOLS
  • Captures changes to source database from
    recovery logs and database triggers and
    propagates changes to copies of the data
  • DYNAMIC TRANFORMATION ENGINES
  • Rule-driven systems that capture data from
    source databases at user-defined intervals,
    transform it, and export it to a data
    warehouse/mart target

22
2. METADATA (What does the data mean?)
  • Logical Structure of DW Including End User Views
  • Identification of Authoritative Data Sources
  • Transformation Rules for Populating DW
  • Transformation Rules to Deliver Data to End-User
    Analytical Tools
  • Subscription Information for Information Delivery
  • DW Operational Information
  • DW Usage Metrics
  • Security Authorizations, Access Control Lists,
    etc.

23
3. DATA WAREHOUSE DATABASE
  • PARALLEL COMPUTING PLATFORMS
  • Exs Symmetric (Shared) Multiprocessors (SMPs)
  • Massively Parallel Processors (MPPs)
  • ROLAP
  • Relational DBMS with Heavy Duty Indexing
    Capabilities
  • MOLAP
  • Multidimensional Databases (MDDBs)
  • 3rd Party Tools that Augment Relational Model

24
4. DATA MARTS
  • A Data Warehouse Focused on a Specific Subject
    Area
  • Subsidiary to a Data Warehouse of Integrated Data
  • More Rapidly Deployable than a Data Warehouse
  • Subject-based vice Enterprise-based

25
5. ACCESS TOOLS
  • QUERY AND REPORTING TOOLS
  • - Managed query tools Layer between user and
    SQL (e.g., BrioQuery)
  • - Configurable report generators (e.g., Brios
    BrioReport)
  • APPLICATIONS
  • - Application development platforms (e.g.,
    PowerSofts PowerBuilder Microsofts Visual
    Basic)

26
ACCESS (contd)
  • OLAP
  • - Support of multidimensional analysis
  • - Ability to drilldown and rollup along any of
    the
  • predefined dimensions
  • - Major vendors Cognos, Business Objects, Brio

27
MULTIDIMENSIONAL DATA MODEL STAR SCHEMA
  • FACTS Core data element being analyzed, e.g.,
    Units_of_Items_Sold
  • DIMENSIONS Attributes about FACTS, e.g.,
    Product_Type, Purchase_Date

28
ROLE OF METRICS
  • Facts should be defined as Measures of
    Effectiveness (sometimes called Key Performance
    Indicators (KPIs))
  • Exs NEC Reutilization Rate
  • Retention Rate
  • Attrition Rate
  • Readiness (Personnel)

29
COGNOS DEMO
  • http//www.cognos.com/products/tours/analysis_laun
    ch.html

30
ACCESS Data Mining
  • Searching for meaningful patterns in large data
    sets
  • Knowledge acquisition
  • Motivated and facilitated by
  • Availability of large data sets
  • Advances in storage technology
  • Data warehouse technology
  • E-commerce and the Internet
  • Exploratory vs. confirmatory analysis

31
6. DW ADMINISTRATION AND MANAGEMENT
  • Normal DBA Responsibilities plus
  • Source Data Quality Checks
  • Keeping track of what all the source data means
  • Managing Very Large Databases (gigabytes or
    terabytes in size)

32
7. INFORMATION DELIVERY SYSTEM
  • How to get information from the data warehouse to
    users?
  • Users subscribe to the data warehouse.
  • Specifically, they subscribe to specific reports
    to be delivered on a periodic basis.
  • Reports are delivered to users Web browser as
    per prescribed frequency.
  • Powerful tool for delivering information to the
    people who need it in an extremely timely
    fashion. True MIS true DSS.

33
BENEFITS OF DATA WAREHOUSE
  • Freedom from restrictions of operational
    databases
  • Decision-oriented
  • Extremely efficient presentation of management
    information
  • Widespread access to critical information for
    those who need it when they need it
  • Knowledge discovery
  • Improves business intelligence
  • Relatively inexpensive to implement
  • Does not require re-engineering of legacy systems

34
GIS GEOGRAPHIC INFORMATION SYSTEMS
  • Ability to visualize data spatially
  • Maps on top of a relational DBMS
  • Data is viewed on maps vice from tables
  • Features
  • - Thematic maps
  • - Spatial queries
  • - Geocoding of data
  • Vendors MapInfo ESRI (ArcInfo)

35
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com