ui.korea.ac.kr - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

ui.korea.ac.kr

Description:

Title: Lecture Note Author: In Soo Kim Keywords: Lecture Note Last modified by: Insoo Kim Created Date: 1/4/2005 2:48:01 PM Document presentation format – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 42
Provided by: InSo5
Category:
Tags: basic | korea | visual

less

Transcript and Presenter's Notes

Title: ui.korea.ac.kr


1
Chapter 8. Clustering Analysis
2008-04-05
Dept. of Industrial Systems Information
Engineering
ui.korea.ac.kr
2
Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ??(Variable)
  • ? ????(Independent Variable)
  • ? ????(Control variable)
  • ? ????(Dependent Variable)

3
Clustering Analysis
Chapter - 8 -
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ??(Scale)
  • ?? ?? ??? ??
  • ????? ??? ? ??? ??
  • ? ????, ???(Discrete) ?? ??
  • ????(Nominal scale)
  • ????(Ordinal scale)
  • ? ???(Continuous) ?? ??
  • ????(Interval scale)
  • ????(Ratio scale)
  • ? ?? ???
  • ? ??? ??? ?? ??
  • ??(Metric, Measurable,
    Quantitative)
  • ???(Non-metric, Categorical,
    Classified)

4
Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ??? ??? ???? ??
  • ??????? ??? ???? ?? ??? ?????
    ??? ??? ???? ??? ????
  • ??? ??? ??? ???
  • ??? ?? ??????? ?? ????? ??????
    ????
  • - ???? ????? ????? ??? ??? ???
    ? ?? ??
  • - ????? ??? ??? ????? ????? ???
    ??? ??? ? ?? ??

?? ???? (??????) ??? ?? ???????? ?
???? ??, ?? ??? ????, ?????, ???? ?? ??, ????? ??, ????????
???? ???? ??? ??????, ????? ??????, ???????, ????, ????
???? ???? ???? ???? ??, ??, ??, ?????, ?????, ????
???? ??? ?? ?? ????, ???? ???? ???, ????, ??, ??, ??, ?????
???? ???? ???
????(???) ??? ??? X2 ??
???? ??, ???? ????
??, ???? ???? ????, ????
??, ???? ??, ???? ????
5
Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ???? ??? ??
  • ??? ?
  • ??
  • ?? ??
  • ? ??? ? ?? ??
  • ??????
  • ?????
  • ? ??? ?? ??
  • ????
  • ?????
  • ? ??? ??? ?? ??
  • ????? ?? ???? ????? ????? ???
    ????? ??? ????? ???
  • ??? ??? ????? ?? ?? ??(????,
    ????, ???? ?)
  • ??????? ?? ???? ????? ????
    ???? ????? ????? ????
  • ???? ?????? ???? ?? ???? ?????
    ??

6
Clustering Analysis
8.1 Statistical Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ???? ??

????
???? ??
???? ????
????
????
????? ?? ??
????
???
???
? ? ? ?
???? ????
????
???
??
????
??? ????
????
??
???
????
??
??
????
??????? ????
????
MDS
7
Clustering Analysis
8.2 Basic Concepts
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ??? ??? ??? ???? ???? ?? ??? ???(???)? ???? ??
    ???? ?
  • ??? ??? ???? ??? ??
  • ? ???? ???? ???? 2? ???? ???? ?? ?? ??
  • ? ??? ??? ??? ??? ??? ??? ???? ?????, ? ??? ???
    ??????
  • ??? ??? ??? ?? ??? ??? ?? ????
  • ? ?? ?? ???? ?????? ?? ???? ????? ???? ????? ? ??
  • ? The basic intuition behind C.A
  • Within Cluster
    Variance
  • Minimize
  • Between Cluster
    Variance

x2
Main goal maximize differences between
clusters relative to variation within clusters
x1
Within-cluster variation Between-cluster variation
8
Clustering Analysis
8.3 CA/FA/MDS/DA
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
CA Vs MDS CA Vs FA CA Vs DA
??? ???? 2?? ?? MDS ??? ?? ???? Row Reduction ??? ?? ?? 2?? ?? ???? Column Reduction ??? ?? ?? ??? ?? (Vector Model)
??? Space-Distance Model Data-Reduction Independent Model Data Reduction Independent Model Metric
9
Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Variables X1 X2 X3 X4
  • ? Cluster Analysis
  • ???? ?? ?? ?? ?? ???? ??
  • - P?? ??? ??? N?? ???? P????? ??? N?? ?
  • - ????(Similarity/Dissimilarity )? ??
  • ????? ??? ??, ??, ?? ?? ??? ??? ???? ????
  • ??? ??
  • - ?? ???(Disjoint) ?? ?? ?? ? ???? ??
  • - ??? (Hierarchical) ?? ? ??? ?? ??? ????
    ?? ?? ??? ???? ??
  • - ??(Overlapping)?? ?? ??? ??? ? ??? ??? ??
  • - ??(Fuzzy)?? ???? ???? ????(????, ???, ??
    ?? ??? ???)

Objects O1 O2 O3 O4 O5
? Possible bases for segmentation -
Dimensions that are outputs of factor analysis.
- Exploratory research. - Price
sensitivities - Heavy-light users -
Demographic variables - Psychographic
variables
10
Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ?? ?? ???? ??? ?? ? ???? ?
  • ? ???? ??
  • ??? ??? ?? ??? ?? ???
  • ?? ?? ??? ?? ???? ?? ?? ??? ?? ??
  • ? ????? ?? ??
  • ??? ??????? ???? ??? ???? ?? ???, ??? ???
    ??? ????
  • ????? ???????
  • ??? ?? ?????(Multicollinearity)? ??? ?? ???
    ?? ? ??

??, ?? ???? ???? ?? ???
11
Clustering Analysis
8.4 Clustering Analysis
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Cluster Method ???? ??? ? ? ??? ?? ??? ??
??(P lt 3) Visual Examination ????? ?? ????
?? ??(??? ?? ??? ? ??? ??? ??? ??) - ???
????(HCA) ?? ? ??? ????? ???? - ????
????(K-MCA) ???(Similarity)? ??(Distance)?
?? - ???? ?? ? ??? ???? ????? ? ??? ?? ???
??? ????? ?? ??? - ??? ?? ? ???
????(Dissimilarity)? ?? ? Determine Similarity
Measures Correlational Measures
Distance Measures (Euclidian, City-Block)
Impact of Unstandardized Data
12
Clustering Analysis
8.5 Analysis Process
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ????? ??
  • 1. ??? ?? ??? ??? ?? ???? ??? ??? ????
  • - ????? ???? ??(??? ??? ?? ??)
  • - ????? ??? ??? ??? ?? ??
  • 2. ???? ?? ??? ???? ??? ??? ????(??? ????)
  • ??? ????(Similarity / Dissimilarity)? ?? ??
  • - Euclidean Distance
  • - Square Euclidean Distance
  • - Mahalanobis Distance
  • - Minkowski Distance
  • ????? ? ??? ?? ??? ??? ????? ?? ??

13
Clustering Analysis
8.5 Analysis Process
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ????? ??
  • 3. ???? ?? ?? ?? Two Types of Algorithms
  • Hierarchical Algorithms
  • ?Agglomerative (build-up) methods
  • - Results from earlier stage are always
    nested within the results at later stages
  • ? Divisive methods
  • - Start with one big cluster and break
    it apart
  • ? Dendrograms or Tree Graphs
  • - Read left to rightor vice-versa
  • Nonhierarchical Algorithms
  • ? K-?? ????(K-means clustering method)
  • 4. ?? ??? ??? ??
  • 5. ?? ??
  • ? ????? ????? ?? ???? ???? ??? ?? ??? ?
    ???? ???
  • ??? ?? ? ????

14
Clustering Analysis
8.6 Cluster Decision Framework
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Considerations 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Algorithm?
Research Problem
Research Design
Hierarchical
Non Hierarchical
Combination
Metric
Non Metric
How many Clusters formed?
Similarity Measure
Yes
Cluster Respecification?
Pattern or Proximity?
Associations
No
Correlation
Distance
Interpret Clusters
Assumptions
Validate and Profile
15
Clustering Analysis
8.7 Consideration
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ??? ??? ???? ??
  • ??? ??
  • - ??? ???? ??
  • - ??? ????? ??? ??? ??? ????
  • - Tree??? ???? ???? ???? ??? ???? ??? ?? ?
    ??
  • - ??? ????? ??? ??
  • ???? ??
  • - ???? ??? ??
  • - ???? 1.???, ???? ???. 2.??? ??? ??? ??.
    3.??? ??
  • - 2?? ?? ??? ? ??? ???? (???? ??? ???? ????)
  • - ??? ???? ????? ?? ??? ??? ?? ? ?? (?,
    ?????-?-?)
  • - ?????? ??? ? ??? ??, ??? ???? ??? ??(???
    ??? ??)

16
Clustering Analysis
8.8 Validity
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? ????? ??? ?? (?? ???)
  • ???? ??? ?? ????? ?? ???
  • - ???? ??? ?? ???
  • ????? ??? ???? ????? ?? ?? ?? ?? ?? ???
  • ???? ? ????? ??(?? ??? ??? ??) - ????? ????
    ???
  • ????
  • 1. ???? ??? ??-???? ?? ?? ??? ???? ?? ??
  • 2. ??? ??? ??
  • - ??? ??? ??? 2???? ???? ??? ????? ?? ???
  • - ???? ????? ?? ??? ?? ??? ???? ?? ???? ?
    ??? ??? ?? ??
  • 3. ??? ??? ?? ?? ????? ???? ??? ??

17
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Hierarchical Clustering Method HCA
? ??? ??(Agglomerative Hierarchical Method AHM)
Polythetic Method ??? ?? ?? ?????
???? ????? ?? (???? ????? ?? ???
???? ??) ? ???(Divisive)
Monothetic Method ?? ??? ??? ????
???? ??? ??? ??? ?? ? ???? ??
? ?? ???(Single Linkage Method)
? ?? ???(Complete Linkage Method) ?
?? ???(Average Linkage Method) ? ??
???(Centroid Linkage Method) ? ???
???(Median Linkage Method) ? Ward ?
??
AHM
Objects
N
Divisive
18
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Methods of Clustering
Minimum Distance (single linkage)
Maximum Distance (Complete linkage)
Average Distance (Average linkage) - the most
common
19
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Other Agglometric Methods of Clustering
Wards method
Centroid method
c.g
c.g
20
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Example Single Linkage Method
1. ?? ??? ? ?? ?? d131.0??? ???? ????? ?? ??
2. d243.0 ??????? ????? ?? ??
3. ?? (2, 4)? 5? ?? ?? (2, 4, 5)? ?? ?????? ?
??? ??
21
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Example Single Linkage Method
Dendrogram
22
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Centroid Linkage Method (N5, Variable2)
Variable
1. D???? ??? ??
Subject
2. d121.0 ????? ?? 1? 2? ?? ??(1,2)? ?? ??
(1, 2)? ??(Centroid)? ? ??? ?? ??? ???
3. ????
Variable
Subject
23
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
? Centroid Linkage Method (N5, Variable2)
Dendrogram
24
Clustering Analysis
8.9 Hierarchical Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? Single Linkage Method ??? ??? ??(??? ???? ??),
    ?????? ???? ??
  • ? SAS Code Proc Cluster ? MethodSingle
  • ? Complete Linkage Method ??? ??? ???
    ??, ??? ???? ??
  • ? SAS Code Proc Cluster ?
    MethodComplete
  • ? Average Linkage Method
  • ? SAS Code Proc Cluster ?
    MethodAverage
  • ? Centroid Linkage Method ?? ?? ???
    ?? ??
  • ? SAS Code Proc Cluster ?
    MethodCentroid
  • ? Median Linkage Method
  • ? SAS Code Proc Cluster ?
    MethodMedian
  • ? Word Method
  • ? SAS Code Proc Cluster ?
    MethodWord

25
Clustering Analysis
8.10 K-Means Clustering Method
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
  • ? K-???? ??? ??? ??(Sequential Threshold
    Method)
  • ???? ?????? ?? ???
  • ??? ????? K? ???? ??? ??
  • K? ?? ?? ????? ?????? ??
  • ?? ???? ??? ?? ? ?? ?? ??

K?? ???? ??
???? ??/?? ??
? ??? ?? ?? ????
????
26
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS ? ??????
????? ?? ??? ???? ???? ???? ?? ??. ??? ????? ???
?? 6?? ??? ????? ???(Subject 10). (X1) ??? ??
?? (X2) ??? ??? ??? ??? ?? (X3) ????? ???
?? (X4) ??? ?? ??? ?????? ?? (X5) ??? ???
?? (X6) ??? ????? ?? ?? ? ??

7 Likert Scale
?? ?? ??(1)--------------??(4)---------------
-?? ??(7)
27
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Hierarchical Clustering Method
28
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS Code
DATA QUEST INPUT X1-X6 CARDS 0.06 40 7 3 2
3 0.02 30 1 4 5 4 0.07 20 6 4 1 3 0.04 60 4 5 3
6 0.01 30 2 2 6 4 0.06 40 6 3 3 4 0.05 30 6 3 3
4 0.07 30 7 4 1 4 0.02 40 3 3 6 3 0.03 50 3 6 4
6 RUN PROC STANDARD MENA0 STD1
OUTTWO PROC CLUSTER OUTTWO METHODCENTROID
TREETWO VAR X1-X6 RUN
DATA QUEST INPUT X1-X6 CARDS 6 4 7 3 2 3 2 3
1 4 5 4 7 2 6 4 1 3 4 6 4 5 3 6 1 3 2 2 6 4 6 4 6
3 3 4 5 3 6 3 3 4 7 3 7 4 1 4 2 4 3 3 6 3 3 5 3 6
4 6 RUN PROC CLUSTER STD METHODCENTROID
TREETWO VAR X1-X6 RUN PROC TREE DATA TWO
HORIZONTAL RUN
METHODSINGLE (?????) METHODCOMPLETE(?????) METOD
AVERAGE(?????)
??? ?? 0, ?? 1? ???
29
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS ??
? ?????? ???


Centroid
Hierarchical Cluster Analysis
The data have been
standardized to mean 0 and variance 1
Root-Mean-Square
Total-Sample Standard Deviation 1
Root-Mean-Square
Distance Between Observations 3.464102
Number
Frequency
Normalized
of
of New
Centroid
Clusters --Clusters Joined--
Cluster Distance Tie
9
OB6 OB7 2
0.281052
8 OB1 CL9
3 0.361764
7 OB3
OB8 2 0.385276
6
OB4 OB10 2
0.428126
5 OB5 OB9
2 0.476894
4 OB2
CL5 3 0.490703
3
CL8 CL7 5
0.497510
2 CL3 CL4
8 1.016941
1 CL2
CL6 10 1.029886
30
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS Code
31
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS Code
? ?? ? ?? Dendrogram?? ??? ?? 6?7? ???? ???
??? 3?8? ???, 5? 9? ??? ??? ???? 4? 10? ???
? ? ??. 3???? ????, (6,7,1,3,8), (5, 9,
2), (4, 10) 2???? ????, (6,7,1,3,8,5,9,2),
(4, 10) ? ??1 ?? ??? ??? ??(6.20),
????? ??? ??(6.40), ??? ?? ??(2.00) ? ??2
??? ??? ??? ??, ????? ??? ?? ??, ??? ??
?? ? ??3 ???? ???? ??? ??? ???, ??? ??
??? ???? ?? ??, ????? ??? ?? ?? ??
32
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SAS Code
? ?? ? ?? ????? ??


9 OB6
OB7 2 0.281052
8
OB1 CL9 3
0.361764
7 OB3
OB8 2 0.385276
6
OB4 OB10 2
0.428126
5 OB5 OB9
2 0.476894
4
OB2 CL5 3
0.490703
3 CL8 CL7
5 0.497510
2
CL3 CL4 8
1.016941
1 CL2 CL6
10 1.029886
9
OB1 OB6 2
0.101674
8 OB2 OB5
2 0.143790
7
OB7 OB8 2
0.143794
6 CL9 OB9
3 0.292047
5
CL8 CL7 4
0.359483
4 CL6 CL5
7 0.593705
3
OB4 OB10 2
0.595757
2 CL4 OB3
8 0.860206
1
CL2 CL3 10
1.336713
??? ??
???? ??
33
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SPSS
? Hierarchical (versus K-means) ? Cluster -
Cases ? Display - Stats / plots ? Stats -
agglomeration schedule (distance between
clusters) - proximity matrix ? Cluster Membership
(none, single, range, from -- to --- clusters ?
Plot (Dendograms or icicle plots) ? Method
() - Cluster method - Measure (interval,
counts, binary) - Transform Values or
Measures ? Save
34
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SPSS
??? ??
???? ?? Analyze?Classify
35
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SPSS
Statistics Plots Method Save
?? ?? ????
36
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SPSS
Statistics Plots Method Save
37
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program

Hierarchical Clustering Method SPSS
??? ??? ??
Dendrogram
38
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
Non-Hierarchical Clustering Method
39
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering SAS Code
DATA QUEST INPUT X1-X6 CARDS 6 4 7 3 2 3 2 3 1
4 5 4 7 2 6 4 1 3 4 6 4 5 3 6 1 3 2 2 6 4 6 4 6 3
3 4 5 3 6 3 3 4 7 3 7 4 1 4 2 4 3 3 6 3 3 5 3 6 4
6 RUN PROC STANDARD MEAN0 STD1
OUTTWO PROC FASTCLUS DATATWO LIST
MAXCLUSTERS3 MAXITER10 VAR X1-X6 RUN
DATA QUEST INPUT X1-X6 CARDS 0.06 40 7 3 2
3 0.02 30 1 4 5 4 0.07 20 6 4 1 3 0.04 60 4 5 3
6 0.01 30 2 2 6 4 0.06 40 6 3 3 4 0.05 30 6 3 3
4 0.07 30 7 4 1 4 0.02 40 3 3 6 3 0.03 50 3 6 4
6 RUN PROC STANDARD MEAN0 STD1
OUTTWO PROC FASTCLUS DATATWO LIST
MAXCLUSTERS3 MAXITER10 VAR X1-X6 RUN
??? ??
? ??? ??? ????? ??? ??? ?? Seed??? ??
?? ?? ??
Seed? ???? ?? ?? ?? ?
40
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering Results

FASTCLUS
Procedure ReplaceFULL Radius0 Maxclusters3
Maxiter10

Initial Seeds
Cluster X1 X2
X3 X4 X5
X6
--------------------------------------------------
----------------------------------------------
1 -0.13553
1.98361 -0.23009 1.12117 -0.21764
1.72648 2
-1.49079 -0.60371 -1.15045
-1.46615 1.41468 -0.09087
3 1.21974 -1.46615
0.69027 0.25873 -1.30586
-0.99954
Minimum Distance Between Initial Seeds
4.694619
Relative Change in Cluster
Seeds
Iteration Criterion 1
2 3
-------------------
------------------------------------------
1
0.6465 0.1580 0.2174 0.3084
2
0.4157 0 0
0
Convergence
criterion is satisfied.

41
Clustering Analysis
8.11 SAS SPSS Program
8.1 Statistical Method 8.2 Basic Concept 8.3
CA/FA/MDS/DA 8.4 Clustering Analysis 8.5 Analysis
Process 8.6 Cluster Decision Framework 8.7
Consideration 8.8 Validity 8.9 Hierarchical
Clustering Method 8.10 K-Means Clustering
Method 8.11 SAS SPSS Program
K-Means Clustering???? ???? ??

Cluster
Listing Obs Cluster Distance from Seed
------------------------------------------
1 3 0.98828 2 2
1.13323 3 3
1.44798 4 1 0.74154
5 2 1.02068 6
3 1.03211 7 3
0.95115 8 3
0.96568 9 2 0.98229
10 1 0.74154 Criterion
Based on Final Seeds 0.41566

Cluster
Listing Obs Cluster Distance from Seed
----------------------------------------
1 1 5.6886 2
1 6.7350 3 3
6.7495 4 2
5.0744 5 1 6.5544
6 1 4.7917 7
3 3.6818 8 3
3.4960 9 1
4.4227 10 2 5.0744
Criterion Based on Final Seeds 2.1834
???? ??
??? ??
??1 (1, 2, 5, 6, 9) ??2 (4, 10) ??3 (3, 7, 8)
??1 (4, 10) ??2 (2, 5, 9) ??3(1, 3, 6, 7, 8)
Write a Comment
User Comments (0)
About PowerShow.com