Data Warehousing Tools - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Data Warehousing Tools

Description:

Data Warehousing Tools & Technologies, and Data Marts. Amrita Yadav. Topic Number: 33 ... Warehousing Tools and Technologies. Extraction, Cleansing and ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 19
Provided by: amr62
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing Tools


1
Data Warehousing Tools Technologies, and Data
Marts
  • Amrita Yadav
  • Topic Number 33

2
Introduction
  • Warehousing Tools and Technologies
  • ? Extraction, Cleansing and Transformation
    Tools? Data Warehouse DBMS? Data Warehouse
    Meta-data? Administration and Management Tools
  • Data Marts

3
Extraction, Cleansing Transformation Tools
  • Can be done separately or using Integrated
    Solutions (categorized as below)
  • Code Generators create customized 3GL/4GL
    transformation programs based on source and
    target data definitions large number of
    programs
  • DB Data Replication Tools use triggers and/or
    recovery logs to capture changes on one
    source system and apply to another system
    support only relational files/DB
  • Dynamic Transformation Engines rule-driven
    dynamic capture data from source at
    user-defined intervals transform and load
    results into larger environment

4
Data Warehouse DBMS
  • Load Performance periodic incremental loads
    within narrow time windows
  • Load Processing processing of data during the
    load (conversion, filtering, reformatting,
    integrity checks, indexing, meta-data updates,
    etc.)
  • Data Quality Management ensure referential
    integrity, and global and local consistency
    inspite of dirty sources and massive sizes
  • Query Performance reasonable amount of
    execution time for large/complex queries
    shouldnt be inhibited by the DW DBMS performance
  • Terabyte Scalability No limit to size and size
    shouldnt affect query performance support
    modular and parallel management

5
Data Warehouse DBMS (contd.)
  • Mass User Scalability capability to support
    100s to 1000s of concurrent users
  • Networked DW cooperation in larger n/ws using
    tools to move data subsets b/w warehouses
    access multiple DWs from single client
    workstation
  • Warehouse Administration implementing resource
    limiting, chargeback accounting priority
    workload tracking and tuning
  • Integrated Dimensional Analysis support for
    multi-dimensional views tools to automate
    creation of pre-computed aggregations
  • Advanced Query Functionality provide
    analytical operations for calculations,
    sequential comparative analysis, consistent
    access to detail summary data

6
Parallel DBMSs
  • Proposed as a solution
  • Large decision support DBMSs using parallel
    technologies over multiple nodes
  • Characteristics Scalability Operability
    Availability
  • Requires multiple processors, memory, disks, and
    network connections
  • Perform many operations simultaneously
  • Tasks are split and distributed across multiple
    processors
  • Unlike serial processing wherein computational
    steps are performed sequentially

7
Parallel DBMSs (contd.)
  • Centralized and Client-Server DB systems not
    powerful enough
  • Must perform parallel query execution, data
    loading, table scanning, and data archiving and
    back-up
  • Two common architectures for DB server platforms
    for DWing Symmetric Multi-Processing
    (SMP) - Tightly coupled processors - Shared
    memory and disk storage Massively Parallel
    Processing (MPP) - Loosely coupled
    processors - Each has own memory disk storage

8
DW Meta-Data
  • Data dictionary for the system system database
  • Multiple functionalities within the DW
    Transformation loading ? describes source data
    its modificationsManagement ? describes data
    as it is storedQuery Manager ? uses generates
    meta-data for queries
  • Synchronization of meta-data 2 standards were
    proposed - Object Information Model (OIM) by
    Meta Data Coalition (MDC) - Common Warehouse
    Meta-model (CWM) by Object Management Group (OMG)
  • Work is being done to merge the two standards to
    allow users to exchange meta-data using a single
    standard

9
Administration Management Tools
  • Monitoring data loading from multiple sources
  • Data Quality and Integrity checks
  • Managing and updating meta-data
  • Monitoring DB performance to ensure efficient
    query response times and resources utilization
  • Auditing DW usage to provide user chargeback
    information
  • Replicating, subsetting, and distributing data
  • Maintaining efficient data storage management
  • Purging data
  • Archiving and backing-up data
  • Implementing recovery following failure
  • Security management

10
Data Marts
  • Definition A subset of a DW that supports the
    requirements of a particular department or
    business function.
  • Normally has summary data not detailed
    operational data
  • Can be either an independent unit or can be
    centrally linked to a corporate DW
  • Usually department specific
  • Easy to understand and navigate due to small size
  • Can be built with different approaches many
    individual DMs that can be later integrated to
    make a DW the entire DW along with 1 or 2 DMs
    to meet immediate needs

11
Data Marts (contd.)
12
Why Data Marts??
  • Data that users need and analyze most often
  • Data formatted to suit the collective view of a
    group of users
  • Improved end-user response time
  • Appropriate structuring of data
  • Easier and simpler implementation as compared to
    a DW
  • Lesser cost
  • More definitive set of users to target

13
Data Marts Issues
  • DM Functionality scalability to larger
    sizes provide sophisticated analysis
    simultaneous remote access to many users
  • DM Size increase in size deteriorates
    performance
  • DM Load performance end-user response time vs.
    data loading performance
  • Users access to data in multiple DMs
    replication of data b/w DMs virtual DMs

14
Data Marts Issues (contd.)
  • DM Internet/Intranet Access low cost access to
    DM/DW using web browsers
  • DM Administration more DMs lead to
    inconsistency issues versioning, data
    meta-data consistency integrity checks,
    enterprise wide security, and performance tuning
  • DM Installation getting more complex over
    time DMs in a box low cost tools

15
Conclusion
  • Tools for Extraction, Cleansing, Transformation
    and Management
  • Parallel DBMSs for DW DBMS requirements
  • DW meta-data and its synchronization
  • What, where, how and why of Data Marts

16
Questions ?
17
  • Which of the following statements is false?a) DM
    can be a standalone unit.b) DM is a subset of a
    DW supporting departmental requirements.c) DW
    must be the source of data for DM.d) DMs have
    better response times due to the reduced volume
    of data.e) Multiple DMs can be integrated to
    form a DW.
  • Code Generators, DB data replication tools,
    dynamic transformation engines are otherwise
    collectively known as _________________.a)
    Integrated Technologiesb) Integrated Toolsc)
    Interrogated Technologiesd) Integrated
    Solutionse) Intrigued Tools
  • Which of the following is false concerning
    Parallel DBMSs? a) are characterized by their
    scalability, operability and availabilityb)
    perform many tasks distributed over multiple
    processors simultaneously c) require multiple
    processors, memory, disks, n/w connections for
    efficient functionalityd) are suitable for
    implementing DW DBMSse) none of these

18
  • Which of the following statements is false?a) DM
    can be a standalone unit.b) DM is a subset of a
    DW supporting departmental requirements.c) DW
    must be the source of data for DM.d) DMs have
    better response times due to the reduced volume
    of data.e) Multiple DMs can be integrated to
    form a DW.
  • Code Generators, DB data replication tools,
    dynamic transformation engines are otherwise
    collectively known as _________________.a)
    Integrated Technologiesb) Integrated Toolsc)
    Interrogated Technologiesd) Integrated
    Solutionse) Intrigued Tools
  • Which of the following is false concerning
    Parallel DBMSs? a) are characterized by their
    scalability, operability and availabilityb)
    perform many tasks distributed over multiple
    processors simultaneously c) require multiple
    processors, memory, disks, n/w connections for
    efficient functionalityd) are suitable for
    implementing DW DBMSse) none of these
Write a Comment
User Comments (0)
About PowerShow.com