Title: ui.korea.ac.kr
1Chapter 8. Clustering Analysis
2008-04-05
Dept. of Industrial Systems Information
Engineering
ui.korea.ac.kr
2Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ??(Variable)
-
- ? ????(Independent Variable)
-
- ? ????(Control variable)
-
- ? ????(Dependent Variable)
-
3Clustering Analysis
Chapter - 8 -
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ??(Scale)
- ?? ?? ??? ??
- ????? ??? ? ??? ??
- ? ????, ???(Discrete) ?? ??
- ????(Nominal scale)
- ????(Ordinal scale)
-
- ? ???(Continuous) ?? ??
- ????(Interval scale)
- ????(Ratio scale)
-
- ? ?? ???
- ? ??? ??? ?? ??
- ??(Metric, Measurable,
Quantitative) - ???(Non-metric, Categorical,
Classified)
4Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ??? ??? ???? ??
- ??????? ??? ???? ?? ??? ?????
??? ??? ???? ??? ???? - ??? ??? ??? ???
- ??? ?? ??????? ?? ????? ??????
???? - - ???? ????? ????? ??? ??? ???
? ?? ?? - - ????? ??? ??? ????? ????? ???
??? ??? ? ?? ??
?? ???? (??????) ??? ?? ???????? ?
???? ??, ?? ??? ????, ?????, ???? ?? ??, ????? ??, ????????
???? ???? ??? ??????, ????? ??????, ???????, ????, ????
???? ???? ???? ???? ??, ??, ??, ?????, ?????, ????
???? ??? ?? ?? ????, ???? ???? ???, ????, ??, ??, ??, ?????
???? ???? ???
????(???) ??? ??? X2 ??
???? ??, ???? ????
??, ???? ???? ????, ????
??, ???? ??, ???? ????
5Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ???? ??? ??
- ??? ?
- ??
- ?? ??
- ? ??? ? ?? ??
- ??????
- ?????
- ? ??? ?? ??
- ????
- ?????
- ? ??? ??? ?? ??
- ????? ?? ???? ????? ????? ???
????? ??? ????? ??? - ??? ??? ????? ?? ?? ??(????,
????, ???? ?) - ??????? ?? ???? ????? ????
???? ????? ????? ???? - ???? ?????? ???? ?? ???? ?????
??
6Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
????
???? ??
???? ????
????
????
????? ?? ??
????
???
???
? ? ? ?
???? ????
????
???
??
????
??? ????
????
??
???
????
??
??
????
??????? ????
????
MDS
7Clustering Analysis
8.2 Basic Concepts
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ??? ??? ??? ???? ???? ?? ??? ???(???)? ???? ??
???? ? - ??? ??? ???? ??? ??
- ? ???? ???? ???? 2? ???? ???? ?? ?? ??
- ? ??? ??? ??? ??? ??? ??? ???? ?????, ? ??? ???
?????? - ??? ??? ??? ?? ??? ??? ?? ????
- ? ?? ?? ???? ?????? ?? ???? ????? ???? ????? ? ??
- ? The basic intuition behind C.A
- Within Cluster
Variance - Minimize
- Between Cluster
Variance
x2
Main goal maximize differences between
clusters relative to variation within clusters
x1
Within-cluster variation Between-cluster variation
8Clustering Analysis
8.3 CA/FA/MDS/DA
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
CA Vs MDS CA Vs FA CA Vs DA
??? ???? 2?? ?? MDS ??? ?? ???? Row Reduction ??? ?? ?? 2?? ?? ???? Column Reduction ??? ?? ?? ??? ?? (Vector Model)
??? Space-Distance Model Data-Reduction Independent Model Data Reduction Independent Model Metric
9Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Variables X1 X2 X3 X4
- ? Cluster Analysis
- ???? ?? ?? ?? ?? ???? ??
- - P?? ??? ??? N?? ???? P????? ??? N?? ?
- - ????(Similarity/Dissimilarity )? ??
-
- ????? ??? ??, ??, ?? ?? ??? ??? ???? ????
- ??? ??
- - ?? ???(Disjoint) ?? ?? ?? ? ???? ??
- - ??? (Hierarchical) ?? ? ??? ?? ??? ????
?? ?? ??? ???? ?? - - ??(Overlapping)?? ?? ??? ??? ? ??? ??? ??
- - ??(Fuzzy)?? ???? ???? ????(????, ???, ??
?? ??? ???)
Objects O1 O2 O3 O4 O5
? Possible bases for segmentation -
Dimensions that are outputs of factor analysis.
- Exploratory research. - Price
sensitivities - Heavy-light users -
Demographic variables - Psychographic
variables
10Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ?? ?? ???? ??? ?? ? ???? ?
- ? ???? ??
- ??? ??? ?? ??? ?? ???
- ?? ?? ??? ?? ???? ?? ?? ??? ?? ??
- ? ????? ?? ??
- ??? ??????? ???? ??? ???? ?? ???, ??? ???
??? ???? - ????? ???????
- ??? ?? ?????(Multicollinearity)? ??? ?? ???
?? ? ??
??, ?? ???? ???? ?? ???
11Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Cluster Method ???? ??? ? ? ??? ?? ??? ??
??(P lt 3) Visual Examination ????? ?? ????
?? ??(??? ?? ??? ? ??? ??? ??? ??) - ???
????(HCA) ?? ? ??? ????? ???? - ????
????(K-MCA) ???(Similarity)? ??(Distance)?
?? - ???? ?? ? ??? ???? ????? ? ??? ?? ???
??? ????? ?? ??? - ??? ?? ? ???
????(Dissimilarity)? ?? ? Determine Similarity
Measures Correlational Measures
Distance Measures (Euclidian, City-Block)
Impact of Unstandardized Data
12Clustering Analysis
8.5 Analysis Process
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ????? ??
- 1. ??? ?? ??? ??? ?? ???? ??? ??? ????
- - ????? ???? ??(??? ??? ?? ??)
- - ????? ??? ??? ??? ?? ??
- 2. ???? ?? ??? ???? ??? ??? ????(??? ????)
- ??? ????(Similarity / Dissimilarity)? ?? ??
- - Euclidean Distance
- - Square Euclidean Distance
- - Mahalanobis Distance
- - Minkowski Distance
- ????? ? ??? ?? ??? ??? ????? ?? ??
13Clustering Analysis
8.5 Analysis Process
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ????? ??
- 3. ???? ?? ?? ?? Two Types of Algorithms
- Hierarchical Algorithms
- ?Agglomerative (build-up) methods
- - Results from earlier stage are always
nested within the results at later stages - ? Divisive methods
- - Start with one big cluster and break
it apart - ? Dendrograms or Tree Graphs
- - Read left to rightor vice-versa
- Nonhierarchical Algorithms
- ? K-?? ????(K-means clustering method)
- 4. ?? ??? ??? ??
- 5. ?? ??
- ? ????? ????? ?? ???? ???? ??? ?? ??? ?
???? ??? - ??? ?? ? ????
14Clustering Analysis
8.6 Cluster Decision Framework
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Algorithm?
Research Problem
Research Design
Hierarchical
Non Hierarchical
Combination
Metric
Non Metric
How many Clusters formed?
Similarity Measure
Yes
Cluster Respecification?
Pattern or Proximity?
Associations
No
Correlation
Distance
Interpret Clusters
Assumptions
Validate and Profile
15Clustering Analysis
8.7 Consideration
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ??? ??? ???? ??
- ??? ??
- - ??? ???? ??
- - ??? ????? ??? ??? ??? ????
- - Tree??? ???? ???? ???? ??? ???? ??? ?? ?
?? - - ??? ????? ??? ??
-
- ???? ??
- - ???? ??? ??
- - ???? 1.???, ???? ???. 2.??? ??? ??? ??.
3.??? ?? - - 2?? ?? ??? ? ??? ???? (???? ??? ???? ????)
- - ??? ???? ????? ?? ??? ??? ?? ? ?? (?,
?????-?-?) - - ?????? ??? ? ??? ??, ??? ???? ??? ??(???
??? ??)
16Clustering Analysis
8.8 Validity
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? ????? ??? ?? (?? ???)
- ???? ??? ?? ????? ?? ???
- - ???? ??? ?? ???
- ????? ??? ???? ????? ?? ?? ?? ?? ?? ???
- ???? ? ????? ??(?? ??? ??? ??) - ????? ????
??? - ????
- 1. ???? ??? ??-???? ?? ?? ??? ???? ?? ??
- 2. ??? ??? ??
- - ??? ??? ??? 2???? ???? ??? ????? ?? ???
- - ???? ????? ?? ??? ?? ??? ???? ?? ???? ?
??? ??? ?? ?? - 3. ??? ??? ?? ?? ????? ???? ??? ??
17Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Hierarchical Clustering Method HCA
? ??? ??(Agglomerative Hierarchical Method AHM)
Polythetic Method ??? ?? ?? ?????
???? ????? ?? (???? ????? ?? ???
???? ??) ? ???(Divisive)
Monothetic Method ?? ??? ??? ????
???? ??? ??? ??? ?? ? ???? ??
? ?? ???(Single Linkage Method)
? ?? ???(Complete Linkage Method) ?
?? ???(Average Linkage Method) ? ??
???(Centroid Linkage Method) ? ???
???(Median Linkage Method) ? Ward ?
??
AHM
Objects
N
Divisive
18Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Methods of Clustering
Minimum Distance (single linkage)
Maximum Distance (Complete linkage)
Average Distance (Average linkage) - the most
common
19Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Other Agglometric Methods of Clustering
Wards method
Centroid method
c.g
c.g
20Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Example Single Linkage Method
1. ?? ??? ? ?? ?? d131.0??? ???? ????? ?? ??
2. d243.0 ??????? ????? ?? ??
3. ?? (2, 4)? 5? ?? ?? (2, 4, 5)? ?? ?????? ?
??? ??
21Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Example Single Linkage Method
Dendrogram
22Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Centroid Linkage Method (N5, Variable2)
Variable
1. D???? ??? ??
Subject
2. d121.0 ????? ?? 1? 2? ?? ??(1,2)? ?? ??
(1, 2)? ??(Centroid)? ? ??? ?? ??? ???
3. ????
Variable
Subject
23Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Centroid Linkage Method (N5, Variable2)
Dendrogram
24Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? Single Linkage Method ??? ??? ??(??? ???? ??),
?????? ???? ?? - ? SAS Code Proc Cluster ? MethodSingle
- ? Complete Linkage Method ??? ??? ???
??, ??? ???? ?? - ? SAS Code Proc Cluster ?
MethodComplete - ? Average Linkage Method
- ? SAS Code Proc Cluster ?
MethodAverage -
- ? Centroid Linkage Method ?? ?? ???
?? ?? - ? SAS Code Proc Cluster ?
MethodCentroid -
- ? Median Linkage Method
- ? SAS Code Proc Cluster ?
MethodMedian -
- ? Word Method
- ? SAS Code Proc Cluster ?
MethodWord
25Clustering Analysis
8.10 K-Means Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
- ? K-???? ??? ??? ??(Sequential Threshold
Method) - ???? ?????? ?? ???
- ??? ????? K? ???? ??? ??
- K? ?? ?? ????? ?????? ??
- ?? ???? ??? ?? ? ?? ?? ??
K?? ???? ??
???? ??/?? ??
? ??? ?? ?? ????
????
26Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS ? ??????
????? ?? ??? ???? ???? ???? ?? ??. ??? ????? ???
?? 6?? ??? ????? ???(Subject 10). (X1) ??? ??
?? (X2) ??? ??? ??? ??? ?? (X3) ????? ???
?? (X4) ??? ?? ??? ?????? ?? (X5) ??? ???
?? (X6) ??? ????? ?? ?? ? ??
7 Likert Scale
?? ?? ??(1)--------------??(4)---------------
-?? ??(7)
27Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method
28Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS Code
DATA QUEST INPUT X1-X6 CARDS 0.06 40 7 3 2
3 0.02 30 1 4 5 4 0.07 20 6 4 1 3 0.04 60 4 5 3
6 0.01 30 2 2 6 4 0.06 40 6 3 3 4 0.05 30 6 3 3
4 0.07 30 7 4 1 4 0.02 40 3 3 6 3 0.03 50 3 6 4
6 RUN PROC STANDARD MENA0 STD1
OUTTWO PROC CLUSTER OUTTWO METHODCENTROID
TREETWO VAR X1-X6 RUN
DATA QUEST INPUT X1-X6 CARDS 6 4 7 3 2 3 2 3
1 4 5 4 7 2 6 4 1 3 4 6 4 5 3 6 1 3 2 2 6 4 6 4 6
3 3 4 5 3 6 3 3 4 7 3 7 4 1 4 2 4 3 3 6 3 3 5 3 6
4 6 RUN PROC CLUSTER STD METHODCENTROID
TREETWO VAR X1-X6 RUN PROC TREE DATA TWO
HORIZONTAL RUN
METHODSINGLE (?????) METHODCOMPLETE(?????) METOD
AVERAGE(?????)
??? ?? 0, ?? 1? ???
29Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS ??
? ?????? ???
Centroid
Hierarchical Cluster Analysis
The data have been
standardized to mean 0 and variance 1
Root-Mean-Square
Total-Sample Standard Deviation 1
Root-Mean-Square
Distance Between Observations 3.464102
Number
Frequency
Normalized
of
of New
Centroid
Clusters --Clusters Joined--
Cluster Distance Tie
9
OB6 OB7 2
0.281052
8 OB1 CL9
3 0.361764
7 OB3
OB8 2 0.385276
6
OB4 OB10 2
0.428126
5 OB5 OB9
2 0.476894
4 OB2
CL5 3 0.490703
3
CL8 CL7 5
0.497510
2 CL3 CL4
8 1.016941
1 CL2
CL6 10 1.029886
30Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS Code
31Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS Code
? ?? ? ?? Dendrogram?? ??? ?? 6?7? ???? ???
??? 3?8? ???, 5? 9? ??? ??? ???? 4? 10? ???
? ? ??. 3???? ????, (6,7,1,3,8), (5, 9,
2), (4, 10) 2???? ????, (6,7,1,3,8,5,9,2),
(4, 10) ? ??1 ?? ??? ??? ??(6.20),
????? ??? ??(6.40), ??? ?? ??(2.00) ? ??2
??? ??? ??? ??, ????? ??? ?? ??, ??? ??
?? ? ??3 ???? ???? ??? ??? ???, ??? ??
??? ???? ?? ??, ????? ??? ?? ?? ??
32Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SAS Code
? ?? ? ?? ????? ??
9 OB6
OB7 2 0.281052
8
OB1 CL9 3
0.361764
7 OB3
OB8 2 0.385276
6
OB4 OB10 2
0.428126
5 OB5 OB9
2 0.476894
4
OB2 CL5 3
0.490703
3 CL8 CL7
5 0.497510
2
CL3 CL4 8
1.016941
1 CL2 CL6
10 1.029886
9
OB1 OB6 2
0.101674
8 OB2 OB5
2 0.143790
7
OB7 OB8 2
0.143794
6 CL9 OB9
3 0.292047
5
CL8 CL7 4
0.359483
4 CL6 CL5
7 0.593705
3
OB4 OB10 2
0.595757
2 CL4 OB3
8 0.860206
1
CL2 CL3 10
1.336713
??? ??
???? ??
33Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SPSS
? Hierarchical (versus K-means) ? Cluster -
Cases ? Display - Stats / plots ? Stats -
agglomeration schedule (distance between
clusters) - proximity matrix ? Cluster Membership
(none, single, range, from -- to --- clusters ?
Plot (Dendograms or icicle plots) ? Method
() - Cluster method - Measure (interval,
counts, binary) - Transform Values or
Measures ? Save
34Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SPSS
??? ??
???? ?? Analyze?Classify
35Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SPSS
Statistics Plots Method Save
?? ?? ????
36Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SPSS
Statistics Plots Method Save
37Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method SPSS
??? ??? ??
Dendrogram
38Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Non-Hierarchical Clustering Method
39Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering SAS Code
DATA QUEST INPUT X1-X6 CARDS 6 4 7 3 2 3 2 3 1
4 5 4 7 2 6 4 1 3 4 6 4 5 3 6 1 3 2 2 6 4 6 4 6 3
3 4 5 3 6 3 3 4 7 3 7 4 1 4 2 4 3 3 6 3 3 5 3 6 4
6 RUN PROC STANDARD MEAN0 STD1
OUTTWO PROC FASTCLUS DATATWO LIST
MAXCLUSTERS3 MAXITER10 VAR X1-X6 RUN
DATA QUEST INPUT X1-X6 CARDS 0.06 40 7 3 2
3 0.02 30 1 4 5 4 0.07 20 6 4 1 3 0.04 60 4 5 3
6 0.01 30 2 2 6 4 0.06 40 6 3 3 4 0.05 30 6 3 3
4 0.07 30 7 4 1 4 0.02 40 3 3 6 3 0.03 50 3 6 4
6 RUN PROC STANDARD MEAN0 STD1
OUTTWO PROC FASTCLUS DATATWO LIST
MAXCLUSTERS3 MAXITER10 VAR X1-X6 RUN
??? ??
? ??? ??? ????? ??? ??? ?? Seed??? ??
?? ?? ??
Seed? ???? ?? ?? ?? ?
40Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering Results
FASTCLUS
Procedure ReplaceFULL Radius0 Maxclusters3
Maxiter10
Initial Seeds
Cluster X1 X2
X3 X4 X5
X6
--------------------------------------------------
----------------------------------------------
1 -0.13553
1.98361 -0.23009 1.12117 -0.21764
1.72648 2
-1.49079 -0.60371 -1.15045
-1.46615 1.41468 -0.09087
3 1.21974 -1.46615
0.69027 0.25873 -1.30586
-0.99954
Minimum Distance Between Initial Seeds
4.694619
Relative Change in Cluster
Seeds
Iteration Criterion 1
2 3
-------------------
------------------------------------------
1
0.6465 0.1580 0.2174 0.3084
2
0.4157 0 0
0
Convergence
criterion is satisfied.
41Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering???? ???? ??
Cluster
Listing Obs Cluster Distance from Seed
------------------------------------------
1 3 0.98828 2 2
1.13323 3 3
1.44798 4 1 0.74154
5 2 1.02068 6
3 1.03211 7 3
0.95115 8 3
0.96568 9 2 0.98229
10 1 0.74154 Criterion
Based on Final Seeds 0.41566
Cluster
Listing Obs Cluster Distance from Seed
----------------------------------------
1 1 5.6886 2
1 6.7350 3 3
6.7495 4 2
5.0744 5 1 6.5544
6 1 4.7917 7
3 3.6818 8 3
3.4960 9 1
4.4227 10 2 5.0744
Criterion Based on Final Seeds 2.1834
???? ??
??? ??
??1 (1, 2, 5, 6, 9) ??2 (4, 10) ??3 (3, 7, 8)
??1 (4, 10) ??2 (2, 5, 9) ??3(1, 3, 6, 7, 8)