Week 9 - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Week 9

Description:

Week 9 Data Mining System (Knowledge Data Discovery) Case Scenario ABC Enterprise is a multinational company that offers multimedia content services in several ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 20
Provided by: Rid105
Category:
Tags: data | detection | fraud | mining | week

less

Transcript and Presenter's Notes

Title: Week 9


1
Week 9
  • Data Mining System
  • (Knowledge Data Discovery)

2
Case Scenario
  • ABC Enterprise is a multinational company that
    offers multimedia content services in several
    regions in Asia. It has more than 6 millions
    content subscribers. For a company of this size,
    another major problem is to maintain good
    relationship with their existing content
    subscribers. Every year, they have to offer good
    content promotion to suit their customer needs.
    However, this is a difficult task because they
    have huge collection of data about their
    subscribers which have different needs and
    lifestyle. Therefore, the CEO of the company, Mr.
    Ridzuan wishes that there is a system that can be
    built to analyze enormous data about their
    subscribers and can suggest what kind of content
    promotions suitable for them.

3
Knowledge Discovery Data Mining
  • Knowledge Discovery (KD) is a process of
    extracting previously unknown, valid, and
    actionable (understandable) information from
    large databases.
  • Data mining is a step in the KDD process of
    applying data analysis and discovery algorithms.
  • Relates to machine learning, pattern recognition,
    statistics, data visualization etc.

4
  • Knowledge discovery in databases (KDD) is the
    non-trivial process of identifying valid,
    potentially useful and ultimately understandable
    patterns in data.

Data Mining
Clean, Collect, Summarize
Data Preparation
Training Data
Data Warehouse
Model Patterns
Verification, Evaluation
Operational Databases
5
Why Mine Data?
  • Huge amounts of data being collected and
    warehoused
  • Walmart records 20 millions per day
  • health care transactions multi-gigabyte
    databases
  • Mobil Oil geological data of over 100 terabytes
  • Affordable computing
  • Competitive pressure
  • gain an edge by providing improved, customized
    services
  • information as a product in its own right

6
Data Mining Methods
  • Prediction Methods
  • using some variables to predict unknown or future
    values of other variables
  • Descriptive Methods
  • finding human-interpretable patterns describing
    the data

7
Data Mining Tasks
  • Classification
  • Clustering
  • Association Rule Discovery
  • Sequential Pattern Discovery

8
1. Classification
  • Data defined in terms of attributes, one of which
    is the class.
  • Find a model for class attribute as a function of
    the values of other(predictor) attributes, such
    that previously unseen records can be assigned a
    class as accurately as possible.

9
ClassificationExample
10
Classification Direct Marketing
  • Goal Reduce cost of soliciting (mailing) by
    targeting a set of consumers likely to buy a new
    product.
  • Data
  • for similar product introduced earlier
  • we know which customers decided to buy and which
    did not buy, not buy class attribute
  • collect various demographic, lifestyle, and
    company related information about all such
    customers - as possible predictor variables.
  • Learn classifier model

11
2. Clustering
  • Given a set of data points, each having a set of
    attributes, and a similarity measure among them,
    find clusters such that
  • data points in one cluster are more similar to
    one another
  • data points in separate clusters are less similar
    to one another.
  • Similarity measures
  • Euclidean distance if attributes are continuous
  • Problem specific measures

12
Clustering Market Segmentation
  • Goal subdivide a market into distinct subsets of
    customers where any subset may conceivably be
    selected as a market target to be reached with a
    distinct marketing mix.
  • Approach
  • collect different attributes on customers based
    on geographical, and lifestyle related
    information
  • identify clusters of similar customers
  • measure the clustering quality by observing
    buying patterns of customers in same cluster vs.
    those from different clusters.

13
3. Association Rule Discovery
  • Given a set of records, each of which contain
    some number of items from a given collection
  • produce dependency rules which will predict
    occurrence of an item based on occurences of
    other items

14
Association Rule Discovery
Marketing and Sales Promotion Application
15
4. Sequential Pattern Discovery
  • Given set of objects, each associated with its
    own timeline of events, find rules that predict
    strong sequential dependencies among different
    events, of the form (A B) (C) (D E) --gt (F)

16
Sequential Pattern Discovery Examples
  • sequences in which customers purchase
    goods/services
  • understanding long term customer behavior --
    timely promotions.
  • In point-of--sale transaction sequences
  • Athletic Apparel Store
  • (Shoes) (Racket, Racketball) --gt (Sports
    Jacket)

17
Data Mining Systems
  • Clementine (SPSS)
  • http//www.spss.com/spssbi/clementine/index.htm
  • Data Miner (Statistica)
  • http//www.statsoft.com/dataminer.html
  • RuleQuest (C5.0)
  • http//www.rulequest.com/

18
Limitation/Challenges
  • large data
  • number of variables (features), number of cases
    (examples)
  • multi gigabyte, terabyte databases
  • efficient algorithms, parallel processing
  • high dimensionality
  • large number of features exponential increase in
    search space (potential for spurious patterns)
  • Use of domain knowledge
  • utilizing knowledge on complex data
    relationships, known facts

19
Intelligence Density Dimension
  • Accuracy
  • Explainability
  • Flexibility
  • Response speed
Write a Comment
User Comments (0)
About PowerShow.com