Title: Analisis Cluster
1Analisis Cluster
Oleh Rahmad Wijaya
2Pokok Bahasan
- 1. Konsep Dasar
- 2. Statistik dalam Analisis Cluster
- 3. Langkah-langkah Analisis Cluster
- Rumuskan Permasalahan
- Memilih ukuran Jarak atau Kesamaan
- Memilih Prosedur Peng-clusteran
- Menetapkan Jumlah Cluster
- Interpretasi dan Profil dari Cluster
- Menaksir Reliabilitas and Validitas
3Konsep Dasar
- Cluster Analysis adalah suatu teknik
mengelompokkan obyek atau cases ke dalam kelompok
yang relatif homogen yang disebut CLUSTER - Analisis Cluster sering juga disebut sebagai
- Classification Analysis
- Numerical Taxonomy
- Pengelompokan dalam prakek sering tidak sama
dengan pengelompokan yang ideal - Perbedaan Analisis Discriminant dengan Cluster
4 Situasi Pengelompokan Ideal
Back
5Situasi Pengelompokan dalam Praktek
Back
6Penggunaan Analisis Cluster
- Contoh
- Segmentasi Pasar.
- Memahami perilaku pembeli
- Mengidentifikasi peluang produk baru.
- Memilih pasar yang akan diuji.
- Mengurangi Data
7Statistik dalam Analisis Cluster
- Agglomeration schedule
- Cluster centroid
- Cluster Centers
- Cluster membership
- Dendrogram
- Distance between cluster centers
- Incicle diagram
8Langkah-langkah Analisis Cluster
Rumuskan Permasalahan
Memilih ukuran Jarak atau Kesamaan
Memilih Prosedur peng-Cluster-an
Menetapkan Jumlah Cluster
Interpretasi dan Profil dari Cluster
Menaksir Reliablitas dan Validitas
9Rumuskan Permasalahan
Contoh Melakukan pengelompokan konsumen
berdasarkan sikap mereka pada akvitivas belanja.
Didasarkan pada penelitian sebelumnya dapat
diidentifikasikan ada enamvariabel sikap.
Konsumen diminta menyatakan tingkat kesepakatan
mereka dengan pernyataan skala tujuh berikut ini
V1 Shopping is fun V2 Shopping is bad for
your budget V3 I combine shopping with eating
out. V4 I try to get best buys while
shopping. V5 I dont care about shopping. V6
You can save a lot of money by comparing
prices. Data yang diperoleh dari 20 responden
adalah sebagai berikut
10Data Mentah
Case No. V1 V2 V3 V4 V5 V6 1 6 4 7 3 2 3 2 2 3
1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6
4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9
2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5
4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5
4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2
6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 2
11Memilih ukuran Jarak atau Kesamaan
Sebab tujuan clustering adalah mengelompokan
obyek bersama-sama, maka beberapa pengukuran
dibutuhkan untuk menilai perbedaan atau kesamaan
diantara obyek. Pengukuran yang sering
dipergunakan adalah Euclidean Distance is
square root of the sum of the square differences
in values for each variables. City Block or
Manhattan distance is the sum of the absolute
differences in value for each variables Chebychev
distance is the maximum absolute difference in
values for any variables.
12 Klasifikasi Prosedur peng-Cluster-an
Clustering Procedures
Nonhierarchical
Hierarchical
Agglomerative
Divisive
Sequential Threshold
Optimizing Partitioning
Parallel Threshold
Linkage Methods
Variance Methods
Centroid Methods
Single
Complete
Average
13Metode Hubungan Cluster (Linkage)
14Metode Cluster Agglomerative lainnya
15Output Cluster Hirarki
16Icicle Plot Vertikal
17Dendrogram Using Wards Method
Back
Case Label Seq
15
20
25
5
10
0
Rescaled Distance Cluster Combine
18Keanggotaan Cluster
Jumlah anggota per cluster
Cluster 4 cluster 3 cluster 2 cluster
1 8 8 8
2 6 6 12
3 5 6
4 1
19Menetapkan Jumlah Cluster
- Pedoman dalam menetapkan jumlah cluster
- Theoretical, conceptual, or practical
consideration may suggest a certain number of
cluster. - In hierarchical clustering, the distance at which
cluster are combined can be used as criteria.
Thins information can be obtained from the
agglomeration schedule or from the dendrogram. - In non hierarchical clustering the ratio within
group variance to between group variance can be
plotted against the number of cluster. Point at
which an elbow or a sharp bend occurs indicates
an appropriate number of clusters. - The relative size of clusters should be
meaningful. In Cluster Membership table by making
a simple frequency count of cluster membership.
We. See that a three-cluster solution result in
cluster with eight, six, and six element.
However, if we go to four-cluster solution, the
size of clusters are eight, six, five, and one.
It is not meaningful to have a cluster with only
one case.
20Cluster Centroids
Nilai Cluster Centriod dapat diperoleh dari
Pengolahan Data K-Mean Cluster (lihat pada Final
Cluster Center)
21Menghitung Cluster Centroids pakai Ms Ecxel
No Resp V1 v2 v3 v4 v5 v6 Cluster membership
1 6 4 7 3 2 3 1
3 7 2 6 4 1 3 1
6 6 4 6 3 3 4 1
7 5 3 6 3 3 4 1
8 7 3 7 4 1 4 1
12 5 4 5 4 2 4 1
15 6 5 4 2 1 4 1
17 4 4 7 2 2 5 1
 5,75 3,63 6 3,13 1,88 3,88 Â
       Â
2 2 3 1 4 5 4 2
5 1 3 2 2 6 4 2
9 2 4 3 3 6 3 2
11 1 3 2 3 5 3 2
13 2 2 1 5 4 4 2
20 2 3 2 4 7 2 2
 1,67 3 1,83 3,5 5,5 3,33 Â
       Â
4 4 6 4 5 3 6 3
10 3 5 3 6 4 6 3
14 4 6 4 6 4 7 3
16 3 5 4 6 4 7 3
18 3 7 2 6 4 3 3
19 4 6 3 7 2 7 3
 3,5 5,83 3,33 6 3,5 6 Â
Cluster centroid untuk Cluster 1
Cluster centroid untuk Cluster 2
Cluster centroid untuk Cluster 2
22Interpretasi and Profil dari Cluster
Kita lihat dari Tabel Cluster Centroid Pada
Cluster 1 V1(shopping is fun), dan V3 (I combine
shopping with eating out) nilainya relatif
tinggi, sehingga cluster ini dapat diberi nama
fun-loving and concerned shoppers Pada Cluster
2 V5(I dont care about shopping) nilainya
relatif tinggi, sehingga cluster ini dapat diberi
nama apathetic shoppers Pada Cluster 3 V2
(Shopping is bad for my budget), V4 (I try to get
the best buys while shopping) , dan V6 (You can
save a lot of money by comparing prices) nilainya
relatif tinggi, sehingga cluster ini dapat diberi
nama economical shoppers
23Menaksir Reliabilitas dan Validitas
Prosedur formal untuk menilai reliabilitas dan
viliditas dari hasil cluster kompleks. Prosedur
berikut cukup memadai untuk mengecek kualitas
hasil cluster 1. Perform cluster analysis on
the same data using different distance measure.
Compare the result across measure to determine
the stability of the solutions. 2. Use different
methods of clustering and compare the result. 3.
Split the data randomly in halves. Perform
clustering separetly on each half. Compare
cluster centroids across the two subsamples. 4.
Delete variables randomly. Perform clustering
based on the reduced set of variables. Compare
the result with those obtained by clustering
based on the entire set of variables.
24Results of Nonhierarchical Clustering
Initial Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 4.0000 6.0000 3.0000
7.0000 2.0000 7.0000 2 2.0000 3.0000 2.0000 4.00
00 7.0000 2.0000 3 7.0000 2.0000 6.0000 4.0000 1.
0000 3.0000
Classification Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 3.8135 5.8992 3.2522
6.4891 2.5149 6.6957 2 1.8507 3.0234 1.8327 3.78
64 6.4436 2.5056 3 6.3558 2.8356 6.1576 3.6736 1.
3047 3.2010
Case Listing of Cluster Membership
Case ID Cluster Distance Case
ID Cluster Distance 1 3 1.780 2 2 2.254 3 3 1.1
74 4 1 1.882 5 2 2.525 6 3 2.340 7 3 1.862 8
3 1.410 9 2 1.843 10 1 2.112 11 2 1.923 12 3 2
.400 13 2 3.382 14 1 1.772 15 3 3.605 16 1 2.1
37 17 3 3.760 18 1 4.421 19 1 0.853 20 2 0.813
25Final Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 3.5000 5.8333 3.3333
6.0000 3.5000 6.0000 2 1.6667 3.0000 1.8333 3.50
00 5.5000 3.3333 3 5.7500 3.6250 6.0000 3.1250 1.
7500 3.8750
Distances between Final Cluster Centers
Cluster 1 2 3 1 0.0000 2 5.5678 0.0000
3 5.7353 6.9944 0.0000
Analysis of Variance
Variable Cluster MS df Error
MS df F
p V1 29.1083 2 0.6078 17
47.8879 .000 V2
13.5458 2 0.6299 17 21.5047
.000 V3 31.3917 2 0.8333 17
37.6700 .000 V4
15.7125 2 0.7279 17 21.5848
.000 V5 24.1500 2 0.7353 17
32.8440 .000 V6
12.1708 2 1.0711 17 11.3632
.001
Number of Cases in each Cluster
Cluster Unweighted Cases Weighted Cases 1
6 6 2 6 6 3 8 8 Missing
0 Total 20
20