CSEngMtCpEng 404 Data Mining - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

CSEngMtCpEng 404 Data Mining

Description:

Daniel C. St. Clair, PhD Christopher Merz, PhD. University of MO Rolla Mastercard International ... buys (X, 'CD changer) [support = 2% confidence = 60 ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 36
Provided by: dstc8
Category:

less

Transcript and Presenter's Notes

Title: CSEngMtCpEng 404 Data Mining


1
CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
  • Daniel C. St. Clair, PhD Christopher Merz, PhD
  • University of MO Rolla Mastercard International
  • Lect 1 Intro. to Data Mining

2
Lecture 1 Contents
  • Intro to CS/EMgt/CpE 404
  • What is data mining KD?
  • Data sources
  • Data mining tasks
  • Introduction to CRISP-DM

DSC
CM
3
Information Age Produces Large Amounts of Data
  • Data collected on almost everything
  • WWW rich data resource
  • Data warehouses required to hold data

4
  • The problem
  • How do we turn information into useful
    knowledge?
  • Solution
  • Data mining knowledge discovery

5
The Knowledge Discovery Process
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
6
Whats in this class?
  • This class provides
  • Tools techniques for producing useful knowledge
    from information
  • Experience in using these tools

7
Data Mining Knowledge Discovery in CS 404
  • Tools
  • Association rules
  • Classification prediction
  • Neural networks
  • Classical tools
  • Correlation
  • Regression
  • Clustering
  • Projects requiring mining knowledge from real
    data

8
CS 404 Class Information
Instructors Daniel C. St. Clair, PhD Christopher
Merz, PhD University of MO Rolla
Mastercard Phone (573) 341-6352 Phone
(636) 722-2143 E-mail stclair_at_umr.edu e-mail
merzc_at_umr.edu CS 404 web page www.umr.edu/s
tclair or http//web.umr.edu/stclair/class/cl
assfiles/cs404_ws04
9
CS 404 Class Information
  • Prerequisites
  • CS 347 (Artificial Intelligence) or CS 304
    (first course in DB systems), ability to program
    in some language
  • and Stat 215 (calculus based stat.)
  • Texts
  • Han, J. Kamber, M., Data Mining Concepts and
    Techniques, Morgan Kaufmann, 2000.
  • Handouts

10
CS 404 Class Information
  • Software
  • Weka Machine Learning Software (weka-3-4jre.exe)
    free program documentation (download from
    http//www.cs.waikato.ac.nz/ml/weka/index.html
  • Matlab Can login to UMR for matlab access.
    Dont purchase a copy.
  • Microsoft Excel (provided on UMR CLC computers)

11
Download
12
(No Transcript)
13
(No Transcript)
14
Class Format
  • Streaming internet video
  • Class includes local and distance students
  • Two-way telephone connection w/ students
  • Instructors available by phone, e-mail, and fax
  • Lectures archived
  • Exams presentations
  • cs404-l_at_umr.edu -- Class listserve
  • NOTE ALL students in this class have a UMR
    e-mail account

15
Questions?
16
Lecture 1 Contents
  • Intro to CS/EMgt/CpE 404
  • What is data mining KD?
  • Data sources
  • Data mining tasks
  • Introduction to CRISP-DM

DSC
CM
17
Data -- Information -- Knowledge
Knowledge can be created from information.
18
What Is Data Mining?How Does It Differ From
Existing Database Technologies?
Data Sources Databases, data warehouses,
Internet Decision Support Systems Tools for
asking questions doing analyses when you know
what you want to ask and where you are going.
(Ex. OLAP tools) Data Mining Process of
discovering knowledge (meaningful new
correlations, patterns, and trends) in data by
sifting through large amounts of data using
pattern recognition as well as statistical and
mathematical techniques.
19
Why Data Mining?
  • Data overload
  • More records
  • Higher record complexity (text, graphical, audio,
    video)
  • Some applications of data mining
  • Business/Industry
  • Competitive edge for business
  • Increase market share
  • Fraud reduction
  • Improve products/processes
  • Find new solutions to difficult problems
  • Text mining

20
Data Mining Example
21
Simple Concept Learning -- Example
  • Routine, well-understood chemistry experiment
    performed numerous times.
  • Expected result occurred about half the time
  • Unexpected result occurred remainder of the time
  • Numerous repetitions of experiment produced
    similar results
  • Careful analysis determined
  • One result produced when setup was in sunlight
  • Second result produced when setup was in shade
  • Careful investigation showed
  • Experiment sensitive to ultraviolet radiation
  • Result
  • Patented method for determining presence of
    ultraviolet radiation

22
Lecture 1 Contents
  • Intro to CS/EMgt/CpE 404
  • What is data mining KD?
  • Data sources
  • Data mining tasks
  • Introduction to CRISP-DM

DSC
CM
23
The Knowledge Discovery Process
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
24
Data Sources
  • Relational Databases
  • Data Warehouses
  • WWW
  • Audio
  • Video
  • Printed Materials

25
Relational Databases
?
26
Lecture 1 Contents
  • Intro to CS/EMgt/CpE 404
  • What is data mining KD?
  • Data sources
  • Data mining tasks
  • Introduction to CRISP-DM

DSC
CM
27
Data Mining Tasks
  • Predictive
  • Perform inference on current data
  • Descriptive (KDD)
  • Characterize general properties of data
  • Notes
  • A measure of certainty or belief must be
    associated with each pattern
  • Interesting patterns must be identified

28
Kinds of Data Patterns to Be Mined
  • Concept/class description
  • Association analyses
  • Classification prediction
  • Cluster analysis
  • Outlier analysis

29
Concept/class Descriptions
  • Example 1
  • Produce a description summarizing
    characteristics of customers who purchase diapers
  • Objective produce a description of those in the
    target class
  • Characterizes class/concept
  • Example 2
  • What properties identify diaper buyers from
    other store customers?
  • Discriminates class/concept
  • Leads to other questions
  • What else do they buy
  • When do they purchase these items?

30
Association Analysis
  • Assoc. Anal. -- discovery of association
    relationships between attribute-value conditions.
  • Such relationships may be expressed in many ways.
    On common way is through association rules.

X gt Y
31
Association Rules
Example age (X, 20 .. 29) income (X,
20K..29K) gt
buys (X, CD
changer) support 2 confidence 60

of data instances satisfying all three
components of rule
of data instances where hypothesis is satisfied
and conclusion is predicted correctly
32
Classification Prediction
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
33
Classification (nonlinear)
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
34
Cluster Analysis
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
35
Some Major Data Mining Issues
  • Mining methodologies
  • User interaction
  • Performance (accuracy, robustness)
  • Heterogeneous databases
  • Interestingness

36
Lecture 1 Contents
  • Intro to CS/EMgt/CpE 404
  • What is data mining KD?
  • Data sources
  • Data mining tasks
  • Introduction to CRISP-DM

DSC
CM
37
CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
  • Daniel C. St. Clair, PhD Christopher Merz, PhD
  • University of MO Rolla Mastercard International
  • Lect 1 Intro. to Data Mining
Write a Comment
User Comments (0)
About PowerShow.com