Welcome! Knowledge Discovery and Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Welcome! Knowledge Discovery and Data Mining

Description:

Turnover (after six month introductory period ends) is 40 ... month before the end of the introductory period is over, predict which customers ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 20
Provided by: Qiang
Category:

less

Transcript and Presenter's Notes

Title: Welcome! Knowledge Discovery and Data Mining


1
Welcome! Knowledge Discovery and Data Mining
  • Qiang Yang
  • Hong Kong University of Science and Technology
  • qyang_at_cs.ust.hk
  • http//www.cs.ust.hk

2
Data Mining An Example
  • You are a marketing manager for a brokerage
    company
  • Problem Churn is too high (also known as
    Attrition)
  • Turnover (after six month introductory period
    ends) is 40
  • Customers receive incentives (average cost 160)
    when account is opened
  • Giving new incentives to everyone who might leave
    is very expensive (as well as wasteful)
  • Bringing back a customer after they leave is both
    difficult and costly

2
3
A Solution
  • One month before the end of the introductory
    period is over, predict which customers will
    leave
  • If you want to keep a customer that is predicted
    to churn, offer them something based on their
    predicted value
  • The ones that are not predicted to churn need no
    attention
  • If you dont want to keep the customer, do
    nothing
  • How can you predict future behavior?
  • Build models
  • Test models

3
4
Convergence of Three Technologies
4
5
Why Now? 1. Increasing Computing Power
  • Moores law doubles computing power every 18
    months
  • Powerful workstations became common
  • Cost effective servers (SMPs) provide parallel
    processing to the mass market

5
6
2. Improved Data Collection
  • Data Collection ? Access ? Navigation ? Mining
  • The more data the better (usually)

6
7
3. Improved Algorithms (AI Data Base)
  • Techniques have often been waiting for computing
    technology to catch up
  • Statisticians already doing manual data mining
  • Good machine learning intelligent application
    of statistical processes
  • A lot of data mining research focused on tweaking
    existing techniques to get small percentage gains

7
8
Definition Predictive Model
  • A black box that makes predictions about the
    future based on information from the past and
    present
  • Large number of inputs usually available

8
9
How are Models Built and Used?
  • View from 20,000 feet

9
10
The Data Mining Process
10
11
What the Real World Looks Like
11
12
Predictive Models are
  • Decision Trees
  • Nearest Neighbor Classification
  • Neural Networks
  • Rule Induction
  • K-means Clustering

12
13
Data Mining is Not ...
  • Data warehousing
  • SQL / Ad Hoc Queries / Reporting
  • Software Agents
  • Online Analytical Processing (OLAP)
  • Data Visualization

13
14
Common Uses of Data Mining
  • Marketing
  • Direct mail marketing
  • Web site personalization
  • Fraud Detection
  • Credit card fraud detection
  • Science
  • Bioinformatics
  • Gene analysis
  • Web Text analysis
  • Google

14
15
Course Description
  • Data Mining and Knowledge Discovery
  • Focus
  • Focus 1 Theoretical foundations in Pattern
    Recognition and Machine Learning
  • Algorithms
  • Differences?
  • where they apply?
  • Focus 2 Broad survey of recent research
  • Focus 3 Hands-on, apply algorithms to KDD data
    sets

16
Topic 1 Foundations
  • Classification algorithms
  • Clustering algorithms
  • Association algorithms
  • Sequential Data Mining
  • Novel Applications
  • Web
  • Customer Relationship Management
  • Biological Data

17
Topic 2 Hands On
  • Apply learned algorithms to selected data sets
  • Get familiar with existing software packages and
    libraries
  • Final Project will involve working with some
    datasets

18
Prerequisites
  • Statistics and Probability would help,
  • but not necessary
  • Pattern Recognition would help,
  • but not necessary
  • Databases
  • Knowledge of SQL and relational algebra
  • But not necessary
  • One programming language
  • One of Java, C, Perl, Matlab, etc.
  • Will need to read Java Library

19
Grading
  • Grade Distribution
  • Assignments (30)
  • Midterm Exam 30
  • Paper Presentation and Presentation 40
Write a Comment
User Comments (0)
About PowerShow.com