Introduction to Data Mining

1 / 17
About This Presentation
Title:

Introduction to Data Mining

Description:

What Is Data Mining? ... Mining different kinds of knowledge in databases ... sensitive data is gathered and mined without individual's knowledge and/or consent ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 18
Provided by: hanys3
Learn more at: https://cs.nyu.edu

less

Transcript and Presenter's Notes

Title: Introduction to Data Mining


1
Introduction to Data Mining
  • Dr. Hany Saleeb

2
Why Data Mining? Potential Applications
  • Direct Marketing
  • identify which prospects should be included in a
    mailing list
  • Market segmentation
  • identify common characteristics of customers who
    buy same products
  • Market Basket Analysis
  • Identify what products are likely to be bought
    together
  • Insurance Claims Analysis
  • discover patterns of fraudulent transactions
  • compare current transactions against those
    patterns

3
What Is Data Mining?
  • Combination of AI and statistical analysis to
    discover information that is hidden in the data
  • associations (e.g. linking purchase of pizza with
    beer)
  • sequences (e.g. tying events together marriage
    and purchase of furniture)
  • classifications (e.g. recognizing patterns such
    as the attributes of employees that are most
    likely to quit)
  • forecasting (e.g. predicting buying habits of
    customers based on past patterns) Expert systems
    or small ML/statistical programs

4
What can data mining do?
  • Classification
  • Classify credit applicants as low, medium, high
    risk
  • Classify insurance claims as normal, suspicious
  • Estimation
  • Estimate the probability of a direct mailing
    response
  • Estimate the lifetime value of a customer
  • Prediction
  • Predict which customers will leave within six
    months
  • Predict the size of the balance that will be
    transferred by a
  • credit card prospect

5
What can data mining do? (contd)
  • Association
  • Find out items customers are likely to buy
    together
  • Find out what books to recommend to Amazon.com
    users
  • Clustering
  • Difference from classification classes are
    unknown!

6
Market Analysis and Management
  • Where are the data sources for analysis?
  • Credit card transactions, loyalty cards, discount
    coupons, customer complaint calls, plus (public)
    lifestyle studies
  • Target marketing
  • Find clusters of model customers who share the
    same characteristics interest, income level,
    spending habits, etc.
  • Determine customer purchasing patterns over time
  • Conversion of single to a joint bank account
    marriage, etc.
  • Cross-market analysis
  • Associations/co-relations between product sales
  • Prediction based on the association information

7
Data Mining Confluence of Multiple Disciplines
Database Technology
Statistics
Data Mining
Machine Learning
Visualization
Information Science
Other Disciplines
8
Data Mining On What Kind of Data?
  • Relational databases
  • Data warehouses
  • Transactional databases
  • Advanced DB and information repositories
  • Object-oriented and object-relational databases
  • Spatial databases
  • Time-series data and temporal data
  • Text databases and multimedia databases
  • Heterogeneous and legacy databases
  • WWW

9
Data Mining Process
Learning
Collecting relevant data
Model building
Understanding of business Problem identification
Business strategy and evaluation
Action
10
Requirements/challenges in Data Mining
  • User interface
  • Mining methodology
  • Performance
  • Data source
  • Social and Security

11
Requirements/challenges in Data Mining(2)
  • User interface
  • - Data Visualization
  • Understandability and interpretation of results
  • Information representation and rendering
  • Screen real-estate
  • - Interactivity
  • Manipulation of mined knowledge
  • focus and refine mining tasks
  • Focus and refine mining results

12
Requirements/challenges in Data Mining(3)
  • Mining Methodology
  • Mining different kinds of knowledge in databases
  • Interactive mining of knowledge at multiple
    levels of abstraction
  • Incorporation of background knowledge
  • Query languages
  • Expression and visualization of results
  • Handling noise and incomplete data
  • Pattern evaluation

13
Requirements/challenges in Data Mining (4)
  • Performance
  • Efficiency and scalability of data mining
    algorithms
  • Linear algorithms needed
  • Parallel and distributed methods
  • Incremental methods
  • Divide and conquer?

14
Requirements/challenges in Data Mining(5)
  • Data Source
  • Diversity of data types
  • Handling complex types of data
  • Mining information from heterogenous data bases
    or information repositories
  • Can we expect a DM algorithm to do well on all
    types of data ?
  • Data glut
  • Are we collecting the right data for the right
    answer?
  • Distinguish between important and unimportant data

15
Requirements/challenges in Data Mining(6)
  • Social and Security
  • -Social Impact
  • Private and sensitive data is gathered and mined
    without individuals knowledge and/or consent
  • Appropriate use and distribution of discovered
    knowledge
  • - Regulations
  • Need for privacy and DM policies

16
Data Mining Tools
17
Summary
  • The benefits of knowing ones business is
    critical technologies are coming together to
    support data mining.
  • Data mining is the process and result of
    knowledge production, knowledge discovery and
    knowledge management.
Write a Comment
User Comments (0)