Title: Data mining exercise Clustering Lab 3
1Data mining exerciseClusteringLab 3
- Winnie Lam
- Email cswinnie_at_comp.polyu.edu.hk
- Website http//www.comp.polyu.edu.hk/cswinnie/
- The Hong Kong Polytechnic University
- Department of Computing
2REVIEW
- Classification modeling
- WEKA (ID3)
- Clementine (C5.0)
- Questions?
3OVERVIEW
Evaluation
Data Mining
Transformation
Preprocessing
Knowledge
Selection
Patterns
Transformed Data
Preprocessed Data
Target Data
4Simplified process
Define target discover useful data
Data Understanding
Obtain Clean Useful data
Data Preparation
Discover patterns
Data Mining
Apply the knowledge
Evaluation
5Download files
- WEKA
- http//prdownloads.sourceforge.net/weka/weka-3-4-8
a.exe - Data file
- MyData_lab3.mdb
- http//www.comp.polyu.edu.hk/cswinnie/data/MyData
_lab3.mdb - lab3.csv
- http//www.comp.polyu.edu.hk/cswinnie/data/lab3.c
sv
6Classification
With predefined class!
7Clustering
No class is defined previously!
STAR
CROSS
TRIANGLE
8Clementine
9Modeling Tools Clustering
- K-means. An approach to clustering that defines k
clusters and iteratively assigns records to
clusters based on distances from the mean of each
cluster until a stable solution is found. - TwoSteps. A clustering method that involves
preclustering the records into a large number of
subclusters and then applying a hierarchical
clustering technique to those subclusters to
define the final clusters. - Kohonen Networks. A type of neural network used
for clustering. Also known as a self organizing
map (SOM).
10Data Understanding
Data file MyData_lab3.mdb
Step 1 Create Data Source (ODBC) in Control Panel
5
1
7
6
2
3
4
11Data Understanding
Step 2 Import Data to Clementine
- Add Source node (Database) to Clementine
- Choose Data Source
- Select Tables (lab3, Link, Shop_Info) lt- one at
each time
1
2
3
4
12(No Transcript)
13Data Preparation
14Goal Merge table lab3 and Shop_Info
Add Node Merge (in Record Ops Palette)
link TID SHOP_CD
lab3 TID dt gp1 gp2 ref_no cl prod_cd
Shop_Info dist_cd shop_cd staffs manager Area
Answer by yourself What is/are the key(s) for
merging?
15- Step 1
- Merge table lab3 and link
Add Node Merge (in Record Ops Palette)
Field from .Link
16Step 2. Merge result in step 1 to table Shop_Info
Add Node Merge (in Record Ops Palette)
Fields from .Shop_Info
17Goal Add a new attribute Weekday and Hour
Useful Node Derive (in Field Ops Palette)
Newly derived
Result
Weekday datetime_day_name(datetime_weekday(dt))
Hour datetime_hour(datetime_time(dt))
18Discretization
Goal Divide the Hour field into 3 intervals
(Fixed-width)
2
3
1
- Steps
- Add Binning node and specify no. of bins
- Add Type node to update the information of
newly added information - Add Re-classify to rename the bins to Morning,
Afternoon, Evening
19Data Transformation
Goal Divide the Staff field into 5 intervals
(Fixed-width)
Result
20Data Mining - Clustering
Goal Divide the customers into 3 clusters
Add Node K-means (in Modeling Palette)
4
1
5
2
3
21Data Mining - Clustering
Find out unsuitable attributes that cannot
represent the cluster
Result
Note You may adjust the value of k (no. of
clusters)
22WEKA
23Import data lab3.csv
Goal remove useless attributes
1
2
24Remaining attributes
25Clustering - SimpleKMeans
Choose Classifier (wekagtclusterersgtSimpleKMeans)
1
2
numClusters -- set number of clusters
seed -- random number seed
26Result
27Comparison
Clementine
Weka
28SUMMARY
- Today, youve learnt
- Derive new attributes
- Merge tables
- Perform discretization (binning)
- Clustering modeling with
- Clementine and WEKA