MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization

Description:

it has become easier than ever to collect data about activities in an ... Torrent (www.torrent.com), ThinkAnalytics (www.thinkanalytics.com) Learning Resources ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 49
Provided by: WadeJa9
Category:

less

Transcript and Presenter's Notes

Title: MD 240 Data Management: Warehousing, Analyzing, Mining and Visualization


1
MD 240Data Management Warehousing, Analyzing,
Mining and Visualization
2
Agenda
  • Background
  • Data Management
  • Data Collection
  • Data Cleaning, Preparation Warehousing
  • Data Analysis
  • Visual Methods for Discovery Presentation
  • Marketing Transaction Databases

3
Background
  • Until recently, it was difficult for analysts and
    managers to perform analyses related to their
    business activities
  • With the spread of PCs and networked devices
  • it has become easier than ever to collect data
    about activities in an organization
  • it has become more feasible to transform analysis
    from a task of the statistician in the back
    office to salespeople, managers, and analysts
    closer to the front office

4
Background
  • Difficulties with data analysis for business
    intelligence
  • Data amount increasing exponentially
  • Multiple sources of data increasing all the
    time
  • Only a small portion of the total data collected
    are usually useful for making a decision
  • Increasing need for external data
  • Differing legal requirements about data
    collection in different countries
  • Selection of data management tool from the many
    available tools
  • Data security, quality, integrity, etc.

5
Data Management
6
Data ManagementData Management Process
  • Data Life Cycle Process
  • Data collection
  • Data stored in databases
  • Pre-process databases
  • Clean out junk
  • Get data close to what decision-makers need
  • Transformation of data
  • Make it ready for analysis
  • Store in data warehouse
  • Use data mining tools to discover patterns
  • Create knowledge
  • Presentation of results

7
Data ManagementData Management Process
Step 1 Raw Data Collection
Step 5 Store in Data Warehouse
Step 6 Discover Patterns w/ Data Mining
Step 7 Interpret, Present, Use Results
Step 2 Data Selection
Step 3 Pre-Process Data
Step 4 Transform Data
Raw Data
Interesting Data
Clean, Usable Data
Data Warehouse
Data
Transform
KDD Analysis
Act on Results
X1,X2
V1 X1/X2
V1 V2 V3
8
Data ManagementData Management Process
9
Data Collection
10
Step 1 Data CollectionData Sources
11
Step 1 Data CollectionData Strategy
  • Fundamental philosophy guiding data collection
  • GIGO garbage in, garbage out

12
Step 1 Data CollectionData Sources
  • Internal data
  • data/info. about organizational activities
  • Personal data
  • data/info. documenting employees activities
  • External data
  • government, competitors, suppliers
  • The Internet
  • screen scraping data out of the browser
  • Commercial database services
  • Online databases

13
Step 1 Data CollectionData Capture and Input
  • Past
  • Type in by hand
  • time consuming
  • costly
  • many typing errors
  • Now
  • Objective is to automate
  • save paper storage costs of leasing warehouses
  • faster access to documents and information in
    documents
  • Document Management Systems
  • scanners for digitizing archived paper documents
  • databases for archiving, search, retrieval

14
Step 1 Data CollectionData Quality (DQ)
  • Intrinsic DQ
  • Accuracy, objectivity, believability, and
    reputation
  • Accessibility DQ
  • Accessibility and access security
  • Contextual DQ
  • Relevance, value added, timeliness, completeness
  • Representation DQ
  • Interpretability, ease of understanding, concise
    representation, and consistent representation

15
Data Cleaning, Preparation Warehousing
16
Steps 2-5 Data WarehousingTransactional
Processing
  • Store data in databases
  • Objectives of TPS
  • Standardized transactions
  • Simple computations
  • non-complex
  • not very mathematical or statistically oriented
  • High volume
  • Low cost

17
Steps 2-5 Data WarehousingTransaction vs.
Analytical Processing
  • Task objectives for a useful analytical data
    delivery system
  • Easy data access by end users
  • Quicker decision making
  • Accurate and effective decision making
  • Flexible decision making

18
Steps 2-5 Data Warehousing Transaction vs.
Analytical Processing
  • Characteristics of a useful analytical data
    delivery system
  • Business representation of data for end users
  • Client-server or Web-based environment that
    provides end users with query and reporting
    capability
  • Server-based repository (data warehouse)

19
Steps 2-5 Data Warehousing Data Warehouse and
Data Marts
  • Data Warehouse
  • establishes a data repository, that ...
  • makes operational data accessible in a form
    readily acceptable for analytical processing
    activities
  • Metadata
  • data summaries for faster indexing and searching
    within data warehouse
  • data summaries
  • information on how the data have been organized
  • Data Mart
  • dedicated to a functional area, or ...
  • dedicated to a regional area

20
Steps 2-5 Data Warehousing Data Warehouse and
Data Marts
21
Steps 2-5 Data Warehousing Characteristics of
Data Warehousing
  • Desirable Characteristics for a Data Warehouse
  • Organization
  • organized by subject extraneous items removed
  • Consistency
  • identical measurement and representation of same
    data
  • Time variant
  • varies over time time-series data
  • Nonvolatile
  • data are not updated once entered
  • Relational
  • table-based structure (RDBMS)

22
Steps 2-5 Data Warehousing Characteristics of
Data Warehousing
  • Data Warehousing is most suitable for
    organizations in which
  • End users need to access large amounts of data
  • Operational data are stored in several different
    systems
  • Different systems represent the same data in
    different formats
  • Management relies on information for decision
    making
  • There is a large, diverse customer base
  • Extensive end-user computing is performed

23
Data Analysis
24
Step 6 Data AnalysisKnowledge Discovery in
Databases (KDD)
  • Foundations of KDD
  • Massive data collection
  • Powerful multiprocessor computers
  • Intelligent data mining algorithms
  • Analyst/manager activities
  • Ad-Hoc Queries
  • OLAP Queries
  • Data Mining

25
Step 6 Data AnalysisAd Hoc Queries
  • Ad Hoc Queries
  • Let users access, navigate, and explore data in
    real time to make business decisions
  • Ad hoc query tool requirements
  • Query creation is easy
  • Customized query creation
  • Easy to use interfaces for performing queries
  • Many data sources are supported
  • Seamless integration between analysis and
    reporting

26
Step 6 Data AnalysisOLAP Queries
  • OLAP
  • An approach by which important queries and
    calculations are turned into online tools that
    managers can use over and over again
  • Decision support software that allows the user to
    quickly analyze information that has been
    summarized into multidimensional views and
    hierarchies
  • MOLAP multidimensional OLAP
  • ROLAP OLAP using relational databases
  • WOLAP web-based OLAP

27
Step 6 Data AnalysisOLAP Queries
  • Capabilities of Online Analytical Processing
    (OLAP)
  • Access very large amounts of data
  • Analyze the relationships between many types of
    business elements
  • Involve aggregated data
  • Compare aggregated data over hierarchical time
    periods
  • Present data in different perspectives
  • Involve complex calculations between data
    elements
  • Able to respond quickly to user requests

28
Step 6 Data AnalysisOLAP Queries
  • OLAP Advantages
  • Adapt existing decision making tools to the WWW,
    integrate them with distributed data stores
  • facilitates drill-down
  • OLAP Shortcomings
  • Retrospective in nature
  • More of a reporting-oriented tool
  • A discovery-oriented tool for flexible data
    analysis of data already known to have importance
  • Less of a prediction-oriented tool

29
Step 6 Data AnalysisData Mining
  • Objectives of Data Mining
  • Automate discovery of previously unknown patterns
  • Automate prediction of
  • trends
  • behaviors
  • events

30
Step 6 Data AnalysisData Mining
  • Nature and Characteristics
  • Data often buried deep within large databases
  • Data wants to be Free!
  • Data may be consolidated in data warehouse or
    kept in internet and intranet servers
  • Usually client-server architecture

31
Step 6 Data AnalysisData Mining
  • Nature and Characteristics (contd)
  • Data mining tools extract information buried in
    corporate files or archived public records
  • The miner is often an end user
  • Striking it rich usually involves finding
    unexpected, valuable results
  • Parallel processing computers often needed to
    make this analysis fast enough to be useful to
    manager

32
Step 6 Data AnalysisData Mining
  • Common types of data mining
  • Mining of numerical data
  • Text mining group documents or identify themes
    or information within documents
  • Documents
  • Web pages
  • Web site clickstream/event mining

33
Step 6 Data AnalysisData Mining
  • Data Mining yields five types of information
  • Association
  • e.g., correlation 0.5 slope between X and Y
    0.73
  • Sequences
  • e.g., biggest, second biggest, etc.
  • Classifications
  • e.g., There are 3 types of competitors, use data
    mining to classify Firm X as a Type 1
    competitor
  • Clusters
  • e.g., We dont know how many types of customers
    there are lets try to discover if we can
    identify some similar customer groups
  • Forecasting

34
Step 6 Data Analysis Data Mining
Techniques/Tools
  • Computer Science
  • Case-based reasoning
  • Neural computing
  • Intelligent agents
  • Others decision trees, genetic algorithms,
    nearest neighbor method, and rule reduction
  • Statistics
  • Cluster analysis
  • Most standard statistical tools (SAS, SPSS)
  • Optimization

35
Step 6 Data Analysis Data Mining
Techniques/Tools
36
Step 6 Data Analysis Data Mining Vendors
  • Vendors
  • SAS Enterprise Miner
  • SPSS Business Intelligence
  • Insightful (www.insightful.com)
  • Microsoft Research
  • IBM
  • Blue Martini
  • Amdocs
  • DBMiner (www.dbminer.com)
  • PrudSys (www.prudsys.de)
  • Boston Area Torrent (www.torrent.com),
    ThinkAnalytics (www.thinkanalytics.com)
  • Learning Resources
  • Association of Computing Machinery (ACM) SIGKDD
  • KDD2002 conference (July 2002)

37
Visual Methods for Discovery Presentation
38
Steps 67 Data Visualization Multidimensionalit
y
  • Multidimensionality
  • real-world data typically have more than 2 or 3
    dimensions
  • managerial analyses may require presentation of
    up to 7 or 8 dimensions to fully communicate
    discoveries
  • Three factors
  • dimensions
  • measures
  • time
  • Solution
  • technology that is flexible enough so that data
    can be organized the way managers prefer to see
    the data

39
Steps 67 Data Visualization Examples of
Variables
  • Dimensions
  • Products, salespeople, market segments, business
    units, geographical locations
  • Measures
  • Money, sales volume, head count, inventory,
    profit, actual versus forecasted
  • Time
  • Daily, weekly, monthly, quarterly, yearly

40
Steps 67 Data VisualizationPresenting
Multidimensional Data
  • Data visualization involves presentation of data
    by digital technology
  • graphical user interfaces
  • digital images
  • geographical information systems
  • multidimensional tables and graphs
  • virtual reality
  • three-dimensional presentations
  • animation

41
Steps 67 Data Visualization Presenting
Multidimensional Data
  • Low Tech Solutions for a few dimensions
  • Multidimensional Tables
  • reduce many dimensions down to 2D table format
  • Slicing and Dicing
  • Data rotation
  • ability to easily switch the 3 variables being
    analyzed and rotate 3D graphs on a computer
    screen
  • High Tech Solutions for many dimensions
  • See Edward Tuftes books
  • The Visual Display of Quantitative Information
  • Envisioning Information
  • Visual Explanations

42
Steps 67 Data Visualization Geographical
Information Systems (GIS)
  • GIS
  • A computer-based system for capturing, storing,
    checking, integrating, manipulating, and
    displaying data using digitized maps.
  • Plot data or present data analysis findings by
  • latitude and longitude
  • cities, major metropolitan areas
  • counties
  • states
  • nations

43
Steps 67 Data Visualization Geographical
Information Systems (GIS)
  • Emerging GIS Applications
  • Sophisticated user interfaces
  • Multimedia, 3D graphics, animated and interactive
    maps
  • Integration of GIS and GPS
  • Reengineer aviation and shipping industries
  • Intelligent GIS (integration of GIS and ES)
  • Hand-held applications
  • Deploy mapping tools to PDAs and Java-based cell
    phones
  • Web applications
  • ESRIs ArcData GIS

44
Steps 67 Data Visualization Geographical
Information Systems (GIS)
  • Vendors
  • ESRI (www.esri.com)
  • Arc/Info
  • ArcData Online (www.esri.com/data/online/index.htm
    l)
  • Resources
  • www.gis.com
  • www.gisday.com
  • www.state.ma.us/mgis/
  • www.northeastarc.org

45
Steps 67 Data Visualization Other
Visualization Tools
  • Visual Interactive Modeling
  • visual modeling of a system
  • Visual Interactive Simulation
  • a visual front end to a simulation program
  • presents animation of system activities and
    statistical results during a simulation run
  • Real-time simulation users can interact with
    the simulation model (prototyping, training,
    entertainment, video games)
  • Virtual Reality
  • Fake environments that attempt to fool the viewer
    into perceiving that they are within a 3D world
  • Usually involves a headset, gloves, and other
    forms of sensory input/output devices

46
Marketing Transaction Databases
47
Application Area MarketingMarketing Transaction
Database (MTD)
  • a new kind of database, oriented toward
    targeting and personalizing marketing messages in
    real time.

48
Application Area MarketingMarketing Transaction
Database (MTD)
  • Purpose targeting and personalization
  • Structure liquid - driven by real-time marketing
  • Updates real-time
  • Data level individual detail
  • Data type demographic (descriptive), behavioral,
    derivative
  • Advantages allows real-time analysis and
    decision-making, CRM
  • Issues emerging, no standards, not integrated
    with other systems
Write a Comment
User Comments (0)
About PowerShow.com