Title: Data Mining Using IBM Intelligent Miner
1Data Mining Using IBM Intelligent Miner
- Presented by
- Qiyan (Jennifer ) Huang
2Outline
- Introduction
- Mining Process
- Main Functionalities of Intelligent Miner
- Other Data Mining Products
- Data Mining and Privacy
- Summary
- References
3What is Data Mining
- Data mining discovering interesting patterns
from large amounts of data - Knowledge discovery (mining) in databases (KDD),
data/pattern analysis, information harvesting,
business intelligence, etc.
4Evolution of Database Technology
- 1960s
- Data collection, database creation
- 1970s
- Relational data model, relational DBMS
implementation - 1980s present
- RDBMS, advanced data models 1990s2000s
- Data mining and data warehousing, multimedia
databases, and Web databases
5Data Mining VS. Database Query
- Identify customers who have purchased more than
10,000 in the last month.
- Find all customers who have purchased milk
- Identify customers with similar buying habits.
(Clustering)
- Find all items which are frequently purchased
with milk. (association rules)
6Data Mining Process (KDD)
Knowledge
Pattern Evaluation
Data Mining
Task-relevant Data
Selection
Data Warehouse
Data Cleaning
J. Han. and M. Kamber. Data Mining Concepts and
Techniques,2001
Databases
7About DB2 Intelligent Miner
- DB2 Intelligent Miner for Data focused on the
large-scale mining, such as large volumes of
data, parallel data mining on Windows NT, Sun
Solaris, and OS/390 IBM
8Main Functionalities
- Cluster analysis
- Group the data that share similar trends and
patterns - Classification
- Predict the outcome based on historical data
- Association analysis
- Finding frequent patterns.
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Classification
This follows an example from Quinlans ID3
19(No Transcript)
20Classification
21Classification
This follows an example from Quinlans ID3
22Association
- Association Rule identifies relationships
- Example
- 30 customers buy shirts in all the
transactions, 60 of these customers - will also by a tie
- Confidence factor is 60
- Support if buying shirt and tie together is
observed in 12 of all transactions, then the
support is thus 12 - Lift 60 / 302
23Association
- Support Confidence Type Lift Rule
Body Rule Head - () ()
- 5.5286 34.0800 2.7300 203
1207 gt 1716 - 7.0388 34.1300 2.7400 203
1719 gt 1716 - 5.4662 34.1700 2.7400 202
802 gt 1716 - 5.8805 34.3400 2.7500 203
802 gt 1716 - 5.0163 34.4900 2.7600 203
705 gt 1716 - 7.1279 34.7400 2.7800 202
1718 gt 1716 - 5.8226 34.7600 3.3900 711
203 gt 710 - 5.0697 34.8300 2.7400 202
1702 gt 1703 - 5.2836 34.8300 2.7400 202
1207 gt 1703 - 5.4350 34.9400 3.4100 201
711 gt 710 - 5.3459 35.0200 2.7600 201
1702 gt 1703
24Data Mining Products
- more than 50 commercial data mining tools
- Wide range of pricing
- SAS Institutes Enterprise Miner 80k
- SPSS Inc. Clementine 75K
- IBM Intelligent Miner 60k
- Desktop products start at few hundred dollars
25Data Mining Products
Data Ming Product Comparison on Algorithm
Algorithm IBM SAS SPSS
Neural Network v v v
Decision Tree v v v
Clustering v v
Association v v
Nearest Neighbour v
Kohonen Self- Organizing Map v v
26Data Mining Privacy
- Release limited subset of data
- Hide attributes that potentially related to
personal information - Release Encrypted Data
- Audit to detect misuse of Data
- Set up Data Mining Controller
27Summary
- Introduction to Data Mining
- A KDD Data Mining Process
- Functionalities of Intelligent Miner
- Commercial Data Mining Tools
- Data Mining Privacy
28References
- Angoss Whitepaper
- http//www.angoss.com/ProdServ/AnalyticalTools/ks
eeker/whitepaper.html. Retrieved on Oct26th,2003 - C. Clifton. D. Marks Security and Privacy
Implications of Data Ming.1996 - D.W. Abbott, I. P. Matkovsky J. F. Elder IV. An
Evaluation of High-end Data Mining Tools - Elder Research. http//www.rgrossman.com/faq/dm-02
.htm. Retrieved on Oct28th,2003 - IBM. BD2 Intelligent Mine.
- http//www-3.ibm.com/software/data/iminer/.
- Retrieved on Oct26th,2003
- J. F. Elder D. W. Abbott. August, 1988 A
comparison of Leading Data Mining Tools - J. Han. and M. Kamber. Data Mining Concepts and
Techniques, 2000 - http//www.cald.cs.cmu.edu/summerschool03/PrivacyP
reservingDM.ppt Retrieved on Nov 10th,2003 - Robert Grossman http//www.datamininglab.com/tool
comp.htmlcomparison. Retrieved on Oct20th,2003 - SPSS. http//www.spss.com/. Retrieved on
Nov12th,2003
29(No Transcript)
30Evolution of Database Technology
- 1960s
- Data collection, database creation, and network
DBMS - 1970s
- Relational data model, relational DBMS
implementation - 1980s
- RDBMS, advanced data models 1990s2000s
- Data mining and data warehousing, multimedia
databases, and Web databases
31Data Mining On What Kind of Data?
- Data Sources
- Relational database
- Data warehouses
- Transactional databases
- WWW
- Data types
- Audio
- Image
- Text
32Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
33Neural network
34Neural network
35Neural network
36Applications of Clustering
- Pattern Recognition
- Image Processing
- Economic Science (especially market research)
- WWW
- Document classification
- Cluster Weblog data to discover groups of similar
access patterns
37Data Mining Privacy
Data Mining Tool
Mining Controller
Data warehouse
38Examples of Clustering Applications
- Marketing Help marketers discover distinct
groups in their customer bases, and then use this
knowledge to develop targeted marketing programs - Insurance Identifying groups of motor insurance
policy holders with a high average claim cost - City-planning Identifying groups of houses
according to their house type, value, and
geographical location - Earth-quake studies Observed earth quake
epicenters should be clustered along continent
faults
39Association
- Association and pattern analysis
- Applications
- Basket data analysis, cross-marketing, catalog
design, loss-leader analysis, clustering,
classification, etc. - Examples.
- buys(x, diapers) buys(x, beers) 0.5,
60 - major(x, CS) takes(x, DB) grade(x, A)
1, 75
40Data Mining On What Kind of Data?
- Relational databases
- Data warehouses
- Transactional databases
- Advanced DB and information repositories
- Object-oriented and object-relational databases
- Text databases and multimedia databases
- Heterogeneous and legacy databases
- WWW
41Steps of a KDD Process
- Learning the application domain
- relevant prior knowledge and goals of application
- Creating a target data set data selection
- Data cleaning and preprocessing (may take 60 of
effort!) - Data reduction and transformation
- Find useful features, dimensionality/variable
reduction, invariant representation. - Choosing functions of data mining
- summarization, classification, regression,
association, clustering. - Choosing the mining algorithm(s)
- Data mining search for patterns of interest
- Pattern evaluation and knowledge presentation
- visualization, transformation, removing redundant
patterns, etc. - Use of discovered knowledge
42Strength and Weakness
- Strength
- Algorithm breadth
- Graphical output
- Available for PC and mainframe environment
- Weakness
- No automation
- Data has to reside in IBMs database system