Components and Architecture - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Components and Architecture

Description:

Title: CS 524 High Performance Computing Author: Asim Karim Last modified by: Asim Karim Created Date: 9/1/2002 7:03:32 AM Document presentation format – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 30
Provided by: AsimK1
Category:

less

Transcript and Presenter's Notes

Title: Components and Architecture


1
Components and Architecture
  • CS 543 Data Warehousing

2
Architecture
  • What are the key components of a data warehouse?
  • Architecture is the structure that binds the
    components into an integrated whole
  • DW architecture provides the overall framework
    for developing and deploying DW solutions

3
Architectural Components
4
Distinguishing Characteristics
  • Different objectives and scope
  • Data content
  • Complex analysis and quick response
  • Flexible and dynamic
  • Metadata driven

5
Architecture Supporting Flow of Data
6
Technical Architecture
  • The technical architecture of a DW is the
    complete set of functions and services provided
    within its components
  • Functions
  • Services
  • Rules and procedures
  • Data stores
  • Tools are the means to implement an architecture
  • Architecture comes first, then the tools select
    the appropriate tools based on the architecture

7
Data Acquisition (1)
  • This component includes
  • Extraction
  • Transfer into staging area
  • Preparation for loading (transformation,
    cleansing, and integration)

8
Data Acquisition (2)
9
Data Acquisition Functions and Services (1)
  • Data extraction
  • Select data sources and determine the types of
    filters to apply to individual sources
  • Generate automatic extract files from operational
    systems using replication and other techniques
  • Create intermediary files to store selected data
    to be merged later
  • Transport extracted files from multiple platforms
  • Provide automated job control services for
    creating extract files
  • Reformat input from outside sources, departmental
    files, databases, and spreadsheets
  • Resolve inconsistencies for common data elements
    from multiple sources
  • Generate common application code for data
    extraction

10
Data Acquisition Functions and Services (2)
  • Data transformation
  • Map input data to data for DW repository
  • Clean data, remove duplicates, merge/purge
  • De-normalize extracted data structures as
    required by the dimensional model of the DW
  • Convert data types
  • Calculate and derive attribute values
  • Check for referential integrity
  • Aggregate data as needed
  • Resolve missing values
  • Consolidate and integrate data

11
Data Acquisition Functions and Services (3)
  • Data staging
  • Provide backup and recovery for staging area
    repository
  • Sort and merge files
  • Create files as input to make changes to
    dimension tables
  • If staging area storage is a relational database,
    create and populate database

12
Data Storage
  • This architectural component covers the process
    of loading the prepared data from the data
    staging area into the data warehouse repository

13
Data Storage Functions and Services
  • Load data for full refreshes of DW tables
  • Perform incremental loads at regular prescribed
    intervals
  • Support loading into multiple tables at the
    detailed and summarized levels
  • Optimize the loading process
  • Provide automated job control services for
    loading the data warehouse
  • Provide backup and recovery for the DW database
  • Provide security
  • Monitor and fine-tune the database
  • Periodically archive data from the database
    according to preset conditions

14
Information Delivery (1)
  • This architectural component spans a broad
    spectrum of many different methods of making
    information available to the users of the DW
  • To the users, information delivery is the DW it
    is the front-end through which the users retrieve
    information from the DW
  • Information
  • Online queries and interactive analyses
  • Regular and ad-hoc reports
  • Specialized applications (e.g. executive
    information system)
  • Data mining

15
Information Delivery (2)
16
Information Delivery Functions and Services
  • Provide security to control information access
  • Monitor user access to improve service and for
    future enhancements
  • Allow users to browse data warehouse content
  • Simplify access by hiding internal complexities
    of data storage from users
  • Automatically reformat queries for optimal
    execution
  • Enable queries to be aware of aggregate tables
    for faster results
  • Govern queries and control runaway queries
  • Provide self-service report generation for users
  • Store result sets for queries and reports for
    future use
  • Provide multiple levels of data granularity
  • Provide event triggers to monitor data loading
  • Make provision for the users to perform complex
    analysis
  • Enable data feeds to downstream, specialized data
    support systems such as EIS and data mining

17
Infrastructure Supporting Architecture
  • The architecture defines the functions and
    services the infrastructure defines the elements
    to support the architecture
  • Infrastructure is the foundation supporting the
    architecture
  • Hardware servers
  • OSs
  • Data management systems
  • Networking elements
  • Supporting tools and applications
  • People
  • Procedures

18
Operational Infrastructure
  • Operational infrastructure includes
  • People
  • Procedures
  • Training
  • Management software
  • Operational infrastructure are the people and
    procedures that keep the DW functioning, and not
    those who develop the DW

19
Physical Infrastructure (1)
20
Physical Infrastructure (2)
  • Physical infrastructure includes
  • Computing hardware (e.g. server)
  • OS and utilities
  • Networking hardware and software
  • Software tools
  • Decisions about the physical infrastructure are
    critical for a DW. Two principles
  • Leverage as much of the existing physical
    infrastructure
  • Keep the infrastructure as modular as possible

21
Hardware and Operating System
  • Hardware
  • Scalability
  • Support
  • Vendor reference
  • Vendor stability
  • Operating system
  • Compatibility
  • Scalability
  • Security
  • Reliability
  • Availability
  • Preemptive multitasking
  • Multi-threaded approach
  • Memory protection

22
Single Platform Option
  • Simplest option, where all functions and services
    are performed by a single computing platform
  • Typically used by small to medium sized companies
    who have mainframes or large Unix servers already
    in use with capacity to spare
  • Some shortcomings of using mainframes
  • Stretched to capacity
  • Non availability of tools
  • Multiple legacy platforms
  • Companys migration policy

23
Hybrid Option
  • Most companies opt for the hybrid option where
    multiple platforms are used for data warehousing
    (data acquisition, data storage, information
    delivery)

24
Data Extraction
  • Data extraction
  • Best performed on each source systems own
    computing platform
  • Initial reformatting and merging
  • Best performed on each source systems own
    computing platform
  • Extract files are reformatted and merged into a
    smaller number of files performing verification
    against the source system
  • Initial data cleansing
  • Also performed on source system platform
  • Transformation and consolidation
  • Performed on the staging area platform
  • Validation and final quality check
  • Performed on the staging area platform
  • Creation of load images
  • Performed on the staging area platform

25
Options for the Data Staging Area
  • In one of the legacy platforms
  • On the data storage platform
  • On a separate optional platform
  • You can optimize the platform for complex
    transformations and cleaning
  • Install specialized tools for transformations
    and cleaning
  • Keep track of entire data content in the staging
    area

26
Data Movement
27
Client/Server Architecture (1)
28
Client/Server Architecture (2)
  • Application server (middle tier)
  • To run middleware and establish connectivity
  • To execute management and control software
  • To handle data access from the Web
  • To manage metadata
  • For authentication
  • As front end
  • For managing and running standard reports
  • For sophisticated query management
  • For OLAP applications

29
Maturing of the Infrastructure
Write a Comment
User Comments (0)
About PowerShow.com