Title: CSEngMtCpEng 404 Data Mining
1CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
- Daniel C. St. Clair, PhD Christopher Merz, PhD
- University of MO Rolla Mastercard International
- Lect 1 Intro. to Data Mining
2Lecture 1 Contents
- Intro to CS/EMgt/CpE 404
- What is data mining KD?
- Data sources
- Data mining tasks
- Introduction to CRISP-DM
DSC
CM
3Information Age Produces Large Amounts of Data
- Data collected on almost everything
- WWW rich data resource
- Data warehouses required to hold data
4- The problem
- How do we turn information into useful
knowledge? - Solution
- Data mining knowledge discovery
5The Knowledge Discovery Process
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
6Whats in this class?
- This class provides
- Tools techniques for producing useful knowledge
from information - Experience in using these tools
7Data Mining Knowledge Discovery in CS 404
- Tools
- Association rules
- Classification prediction
- Neural networks
- Classical tools
- Correlation
- Regression
- Clustering
- Projects requiring mining knowledge from real
data
8CS 404 Class Information
Instructors Daniel C. St. Clair, PhD Christopher
Merz, PhD University of MO Rolla
Mastercard Phone (573) 341-6352 Phone
(636) 722-2143 E-mail stclair_at_umr.edu e-mail
merzc_at_umr.edu CS 404 web page www.umr.edu/s
tclair or http//web.umr.edu/stclair/class/cl
assfiles/cs404_ws04
9CS 404 Class Information
- Prerequisites
- CS 347 (Artificial Intelligence) or CS 304
(first course in DB systems), ability to program
in some language - and Stat 215 (calculus based stat.)
- Texts
- Han, J. Kamber, M., Data Mining Concepts and
Techniques, Morgan Kaufmann, 2000. - Handouts
10CS 404 Class Information
- Software
- Weka Machine Learning Software (weka-3-4jre.exe)
free program documentation (download from
http//www.cs.waikato.ac.nz/ml/weka/index.html - Matlab Can login to UMR for matlab access.
Dont purchase a copy. - Microsoft Excel (provided on UMR CLC computers)
11Download
12(No Transcript)
13(No Transcript)
14Class Format
- Streaming internet video
- Class includes local and distance students
- Two-way telephone connection w/ students
- Instructors available by phone, e-mail, and fax
- Lectures archived
- Exams presentations
- cs404-l_at_umr.edu -- Class listserve
- NOTE ALL students in this class have a UMR
e-mail account
15Questions?
16Lecture 1 Contents
- Intro to CS/EMgt/CpE 404
- What is data mining KD?
- Data sources
- Data mining tasks
- Introduction to CRISP-DM
DSC
CM
17Data -- Information -- Knowledge
Knowledge can be created from information.
18What Is Data Mining?How Does It Differ From
Existing Database Technologies?
Data Sources Databases, data warehouses,
Internet Decision Support Systems Tools for
asking questions doing analyses when you know
what you want to ask and where you are going.
(Ex. OLAP tools) Data Mining Process of
discovering knowledge (meaningful new
correlations, patterns, and trends) in data by
sifting through large amounts of data using
pattern recognition as well as statistical and
mathematical techniques.
19Why Data Mining?
- Data overload
- More records
- Higher record complexity (text, graphical, audio,
video) - Some applications of data mining
- Business/Industry
- Competitive edge for business
- Increase market share
- Fraud reduction
- Improve products/processes
- Find new solutions to difficult problems
- Text mining
20Data Mining Example
21Simple Concept Learning -- Example
- Routine, well-understood chemistry experiment
performed numerous times. - Expected result occurred about half the time
- Unexpected result occurred remainder of the time
- Numerous repetitions of experiment produced
similar results - Careful analysis determined
- One result produced when setup was in sunlight
- Second result produced when setup was in shade
- Careful investigation showed
- Experiment sensitive to ultraviolet radiation
- Result
- Patented method for determining presence of
ultraviolet radiation
22Lecture 1 Contents
- Intro to CS/EMgt/CpE 404
- What is data mining KD?
- Data sources
- Data mining tasks
- Introduction to CRISP-DM
DSC
CM
23The Knowledge Discovery Process
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
24Data Sources
- Relational Databases
- Data Warehouses
- WWW
- Audio
- Video
- Printed Materials
-
25Relational Databases
?
26Lecture 1 Contents
- Intro to CS/EMgt/CpE 404
- What is data mining KD?
- Data sources
- Data mining tasks
- Introduction to CRISP-DM
DSC
CM
27Data Mining Tasks
- Predictive
- Perform inference on current data
- Descriptive (KDD)
- Characterize general properties of data
- Notes
- A measure of certainty or belief must be
associated with each pattern - Interesting patterns must be identified
28Kinds of Data Patterns to Be Mined
- Concept/class description
- Association analyses
- Classification prediction
- Cluster analysis
- Outlier analysis
29Concept/class Descriptions
- Example 1
- Produce a description summarizing
characteristics of customers who purchase diapers - Objective produce a description of those in the
target class - Characterizes class/concept
- Example 2
- What properties identify diaper buyers from
other store customers? - Discriminates class/concept
- Leads to other questions
- What else do they buy
- When do they purchase these items?
30Association Analysis
- Assoc. Anal. -- discovery of association
relationships between attribute-value conditions. - Such relationships may be expressed in many ways.
On common way is through association rules.
X gt Y
31Association Rules
Example age (X, 20 .. 29) income (X,
20K..29K) gt
buys (X, CD
changer) support 2 confidence 60
of data instances satisfying all three
components of rule
of data instances where hypothesis is satisfied
and conclusion is predicted correctly
32Classification Prediction
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
33Classification (nonlinear)
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
34Cluster Analysis
Source Fayyad, U., Piatetsky-Shapiro, G.,
Smyth, P, From Data Mining To Knowledge Discovery
In Databases, AI Magazine, Fall 1996.
35Some Major Data Mining Issues
- Mining methodologies
- User interaction
- Performance (accuracy, robustness)
- Heterogeneous databases
- Interestingness
36Lecture 1 Contents
- Intro to CS/EMgt/CpE 404
- What is data mining KD?
- Data sources
- Data mining tasks
- Introduction to CRISP-DM
DSC
CM
37CS/EngMt/CpEng 404Data Mining Knowledge
Discovery
- Daniel C. St. Clair, PhD Christopher Merz, PhD
- University of MO Rolla Mastercard International
- Lect 1 Intro. to Data Mining