Data Mining Overview - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Data Mining Overview

Description:

... features are columns Classification Supervised ... choose the category C that represent the most instances Conclude that T belongs to category C Clustering ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 15
Provided by: ate103
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Overview


1
Data Mining Overview
  • Business Intelligence

2
Data Mining Defined
  • Knowledge discovery in databases
  • Extracting implicit, previously unknown
    information from large volumes of raw data

3
Instances and Features
  • Typically, the database will be a collection of
    instances
  • Each instance will have values for a given set of
    features
  • From database theoryinstances are rows,
    features are columns

4
Classification
  • Supervised learning
  • Suppose instances have been categorized into
    classes and the database includes this
    categorization
  • Goal using the knowledge in the database,
    classify a given instance

5
Classifiers
X1
X2
feature values
Y
X3
category
Classifier

Xn
DB
collection of instanceswith known categories
6
Classifier intelligence
  • A classifiers intelligence will be based on a
    dataset consisting of instances with known
    categories
  • Typical goal of a classifier predict the
    category of a new instance that is rationally
    consistent with the dataset

7
BI Examples
  • A loans officer in a bank uses a system that
    automatically approves or disapproves a loan
    application based on previous loan applications
    and decisions
  • An admissions officer in a university uses a
    system that automatically makes an admission
    decision (accept, reject, wait-list), based on
    previous applicants data and decisions made on
    them

8
Data mining method examplek - nearest neighbors
  • For a given instance T, get the top k database
    instances that are nearest to T
  • Select a reasonable distance measure
  • Inspect the category of these k instances, choose
    the category C that represent the most instances
  • Conclude that T belongs to category C

9
Clustering
  • Unsupervised learning
  • Classes/categories are not known, but unexpected
    groupings (clusters) are discovered
  • Clustering provides insight into the population
    segments

10
Clustering
Feature 2
Feature 1
11
Goal of Clustering
  • Input the database of instances, and possibly
    some predetermined number of clusters
  • Output the same database of instances
    partitioned into clusters

12
BI Examples
  • After clustering the current university student
    population, it was discovered that there is a
    large group of female marketing majors coming
    from a particular exclusive school who tend to
    get high grades
  • business response focus recruitment on that
    school push the universitys marketing program
  • Customer segment characteristics and spending
    patterns can direct business strategies

13
Data mining method example k-means
  • Guess the number of clusters (k)
  • Guess cluster centers from the samples (these
    will be called centroids)
  • Determine cluster membership based on the
    distance from the centroids
  • Repeatedly refine the centroids by getting the
    average (mean) of the members of each cluster

14
Summary
  • Two sub-areas of data mining have been discussed
    supervised (classification) and unsupervised
    (clustering) learning methods
  • For both types of methods, intelligent systems
    can be created to support business decision making
Write a Comment
User Comments (0)
About PowerShow.com