Collection of general data mining briefings - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Collection of general data mining briefings

Description:

Technologies for Data Mining. Why Data Mining Now? Preparation for Data Mining ... What are the technologies for data mining? ... Data Mining is now a technology ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 21
Provided by: chrisc8
Category:

less

Transcript and Presenter's Notes

Title: Collection of general data mining briefings


1
Data and Applications Security Introduction to
Data Mining
Dr. Bhavani Thuraisingham Guest Lecture

February 25, 2008
2
Objective of the Unit
  • This unit provides an introduction to data mining

3
Outline of Data Mining
  • What is Data Mining?
  • Data warehousing vs data mining
  • Steps to Data Mining
  • Need for Data Mining
  • Example Applications
  • Technologies for Data Mining
  • Why Data Mining Now?
  • Preparation for Data Mining
  • Data Mining Tasks, Methodology, Techniques
  • Commercial Developments
  • Status, Challenges , and Directions

4
What is Data Mining?
5
Data Warehouses vs Data Mining
  • Goal Improved business efficiency
  • Improve marketing (advertise to the most likely
    buyers)
  • Inventory reduction (stock only needed
    quantities)
  • Information source Historical business data
  • Example Supermarket sales records
  • Size ranges from 50k records (research studies)
    to terabytes (years of data from chains)
  • Data is already being warehoused
  • Sample question what products are generally
    purchased together?
  • The answers are in the data, need to MINE the data

6
What Does Warehousing do for Data Mining?
  • Difficult to mine disparate data sources
  • Data warehouse integrates the disparate data
    sources into a single logical entity
  • Maintains integrity of the data
  • Scrubbing and Cleaning
  • Formats the data for querying and mining
  • Multidimensional data

7
Is it Necessary to Have a Data Warehouse for Data
Mining?
  • Key to successful data mining is having good data
  • Data warehousing integrates heterogeneous data
    sources, formats the data, and facilitates
    interactive query processing
  • Having a data warehouse is good for data mining,
    but perhaps not essential
  • Data mining tools could be used directly on
    good/clean databases

8
Whats going on in data mining?
  • What are the technologies for data mining?
  • Database management, data warehousing, machine
    learning, statistics, pattern recognition,
    visualization, parallel processing
  • What can data mining do for you?
  • Data mining outcomes Classification, Clustering,
    Association, Anomaly detection, Prediction,
    Estimation, . . .
  • How do you carry out data mining?
  • Data mining techniques Decision trees, Neural
    networks, Market-basket analysis, Link analysis,
    Genetic algorithms, . . .
  • What is the current status?
  • Many commercial products mine relational
    databases
  • What are some of the challenges?
  • Mining unstructured data, extracting useful
    patterns, web mining, Data mining, national
    security and privacy

9
Steps to Data Mining
Clean/ modify data sources
Mine the data
Integrate data sources
Report final results
Examine Results/ Prune results
Take Actions
Data Sources
10
Knowledge Directed to Data Mining
Mine the data
Clean/ modify data sources
Integrate data sources
Expert System
Report final results
Examine Results/ Prune results
Take Actions
Data Sources
11
Need for Data Mining
  • Large amounts of current and historical data
    being stored
  • As databases grow larger, decision-making from
    the data is not possible need knowledge derived
    from the stored data
  • Data for multiple data sources and multiple
    domains
  • Medical, Financial, Military, etc.
  • Need to analyze the data
  • Support for planning (historical supply and
    demand trends)
  • Yield management (scanning airline seat
    reservation data to maximize yield per seat)
  • System performance (detect abnormal behavior in a
    system)
  • Mature database analysis (clean up the data
    sources)

12
Example Applications
  • Medical supplies company increases sales by
    targeting certain physicians in its advertising
    who are likely to buy the products
  • A credit bureau limits losses by selecting
    candidates who are likely not to default on their
    payment
  • An Intelligence agency determines abnormal
    behavior of its employees
  • An investigation agency finds fraudulent behavior
    of some people

13
Integration of Multiple Technologies
Data Warehousing
Machine Learning
Database Management
Statistics
Parallel Processing
Visualization
Data Mining
14
Why Data Mining Now?
  • Large amounts of data is being produced
  • Data is being organized
  • Technologies are developing for database
    management, data warehousing, parallel
    processing, machine intelligent, etc.
  • It is now possible to mine the data and get
    patterns and trends
  • Interesting applications exist

15
Preparation for Data Mining
  • Getting the data into the right format
  • Data warehousing
  • Scrubbing and cleaning the data
  • Some idea of application domain
  • Determining the types of outcomes
  • e.g., Clustering, classification
  • Evaluation of tools
  • Getting the staff trained in data mining

16
Some Types of Data Mining (Data Mining
Tasks/Outcomes)
  • Classification grouping records into meaningful
    subclasses
  • e.g., Marketing organization has a list of people
    living in Manhattan all owning cars costing over
    20K
  • Sequence Detection
  • John always buys groceries after going to the
    bank
  • Data dependency analysis identifying
    potentially interesting dependencies or
    relationships among data items
  • If John, James, and Jane meet, Bill is also
    present
  • Deviation detection discovery of significant
    differences between an observation and some
    reference
  • Anomalous instances
  • Discrepancies between observed and expected values

17
Data Mining Methodology (or Approach)
  • Top-down
  • Hypothesis testing
  • Validate beliefs
  • Bottom-up
  • Discover patterns
  • Directed
  • Some idea what you want to get
  • Undirected
  • Start from fresh

18
Some Data Mining Techniques
  • Market Basket analysis
  • Decision Trees
  • Neural networks
  • Rough sets and fuzzy logic
  • Inductive logic programming

19
Commercial Developments in Data Mining Some
Early Products
  • Information Discovery-IDIS
  • WizSoft - WhizWhy
  • Hugin - Hugin
  • IBM - Intelligent Miner
  • Red Brick DataMind (became part of Informix and
    now part of IBM)
  • Neo Vista - Decision Series
  • Reduct Systems - Datalogic/R
  • Lockheed Martin - Recon
  • Nicesoft Nicel
  • SAS Enterprise Miner
  • Recent products will be discussed in Unit 9

20
Current Status, Challenges and Directions
  • Status
  • Data Mining is now a technology
  • Several prototypes and tools exist Many or
    almost all of them work on relational databases
  • Challenges
  • Mining large quantities of data Dealing with
    noise and uncertainty
  • Directions
  • Mining multimedia and text databases, Web mining
    (structure, usage and content), Data mining,
    national security and privacy
Write a Comment
User Comments (0)
About PowerShow.com