Analisis Cluster - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Analisis Cluster

Description:

Title: No Slide Title Last modified by: Wied Document presentation format: On-screen Show Other titles: Times New Roman Garamond Wingdings Arial Arial Narrow Stream ... – PowerPoint PPT presentation

Number of Views:335
Avg rating:3.0/5.0
Slides: 26
Provided by: acid150
Category:
Tags: analisis | cluster

less

Transcript and Presenter's Notes

Title: Analisis Cluster


1
Analisis Cluster
Oleh Rahmad Wijaya
2
Pokok Bahasan
  • 1. Konsep Dasar
  • 2. Statistik dalam Analisis Cluster
  • 3. Langkah-langkah Analisis Cluster
  • Rumuskan Permasalahan
  • Memilih ukuran Jarak atau Kesamaan
  • Memilih Prosedur Peng-clusteran
  • Menetapkan Jumlah Cluster
  • Interpretasi dan Profil dari Cluster
  • Menaksir Reliabilitas and Validitas

3
Konsep Dasar
  • Cluster Analysis adalah suatu teknik
    mengelompokkan obyek atau cases ke dalam kelompok
    yang relatif homogen yang disebut CLUSTER
  • Analisis Cluster sering juga disebut sebagai
  • Classification Analysis
  • Numerical Taxonomy
  • Pengelompokan dalam prakek sering tidak sama
    dengan pengelompokan yang ideal
  • Perbedaan Analisis Discriminant dengan Cluster

4
Situasi Pengelompokan Ideal
Back
5
Situasi Pengelompokan dalam Praktek
Back
6
Penggunaan Analisis Cluster
  • Contoh
  • Segmentasi Pasar.
  • Memahami perilaku pembeli
  • Mengidentifikasi peluang produk baru.
  • Memilih pasar yang akan diuji.
  • Mengurangi Data

7
Statistik dalam Analisis Cluster
  • Agglomeration schedule
  • Cluster centroid
  • Cluster Centers
  • Cluster membership
  • Dendrogram
  • Distance between cluster centers
  • Incicle diagram

8
Langkah-langkah Analisis Cluster
Rumuskan Permasalahan
Memilih ukuran Jarak atau Kesamaan
Memilih Prosedur peng-Cluster-an
Menetapkan Jumlah Cluster
Interpretasi dan Profil dari Cluster
Menaksir Reliablitas dan Validitas
9
Rumuskan Permasalahan
Contoh Melakukan pengelompokan konsumen
berdasarkan sikap mereka pada akvitivas belanja.
Didasarkan pada penelitian sebelumnya dapat
diidentifikasikan ada enamvariabel sikap.
Konsumen diminta menyatakan tingkat kesepakatan
mereka dengan pernyataan skala tujuh berikut ini
V1 Shopping is fun V2 Shopping is bad for
your budget V3 I combine shopping with eating
out. V4 I try to get best buys while
shopping. V5 I dont care about shopping. V6
You can save a lot of money by comparing
prices. Data yang diperoleh dari 20 responden
adalah sebagai berikut
10
Data Mentah
Case No. V1 V2 V3 V4 V5 V6 1 6 4 7 3 2 3 2 2 3
1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6
4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9
2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5
4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5
4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2
6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 2
11
Memilih ukuran Jarak atau Kesamaan
Sebab tujuan clustering adalah mengelompokan
obyek bersama-sama, maka beberapa pengukuran
dibutuhkan untuk menilai perbedaan atau kesamaan
diantara obyek. Pengukuran yang sering
dipergunakan adalah Euclidean Distance is
square root of the sum of the square differences
in values for each variables. City Block or
Manhattan distance is the sum of the absolute
differences in value for each variables Chebychev
distance is the maximum absolute difference in
values for any variables.
12
Klasifikasi Prosedur peng-Cluster-an
Clustering Procedures
Nonhierarchical
Hierarchical
Agglomerative
Divisive
Sequential Threshold
Optimizing Partitioning
Parallel Threshold
Linkage Methods
Variance Methods
Centroid Methods
Single
Complete
Average
13
Metode Hubungan Cluster (Linkage)
14
Metode Cluster Agglomerative lainnya
15
Output Cluster Hirarki
16
Icicle Plot Vertikal
17
Dendrogram Using Wards Method
Back
Case Label Seq
15
20
25
5
10
0
Rescaled Distance Cluster Combine
18
Keanggotaan Cluster
Jumlah anggota per cluster
Cluster 4 cluster 3 cluster 2 cluster
1 8 8 8
2 6 6 12
3 5 6
4 1
19
Menetapkan Jumlah Cluster
  • Pedoman dalam menetapkan jumlah cluster
  • Theoretical, conceptual, or practical
    consideration may suggest a certain number of
    cluster.
  • In hierarchical clustering, the distance at which
    cluster are combined can be used as criteria.
    Thins information can be obtained from the
    agglomeration schedule or from the dendrogram.
  • In non hierarchical clustering the ratio within
    group variance to between group variance can be
    plotted against the number of cluster. Point at
    which an elbow or a sharp bend occurs indicates
    an appropriate number of clusters.
  • The relative size of clusters should be
    meaningful. In Cluster Membership table by making
    a simple frequency count of cluster membership.
    We. See that a three-cluster solution result in
    cluster with eight, six, and six element.
    However, if we go to four-cluster solution, the
    size of clusters are eight, six, five, and one.
    It is not meaningful to have a cluster with only
    one case.

20
Cluster Centroids
Nilai Cluster Centriod dapat diperoleh dari
Pengolahan Data K-Mean Cluster (lihat pada Final
Cluster Center)
21
Menghitung Cluster Centroids pakai Ms Ecxel
No Resp V1 v2 v3 v4 v5 v6 Cluster membership
1 6 4 7 3 2 3 1
3 7 2 6 4 1 3 1
6 6 4 6 3 3 4 1
7 5 3 6 3 3 4 1
8 7 3 7 4 1 4 1
12 5 4 5 4 2 4 1
15 6 5 4 2 1 4 1
17 4 4 7 2 2 5 1
  5,75 3,63 6 3,13 1,88 3,88  
               
2 2 3 1 4 5 4 2
5 1 3 2 2 6 4 2
9 2 4 3 3 6 3 2
11 1 3 2 3 5 3 2
13 2 2 1 5 4 4 2
20 2 3 2 4 7 2 2
  1,67 3 1,83 3,5 5,5 3,33  
               
4 4 6 4 5 3 6 3
10 3 5 3 6 4 6 3
14 4 6 4 6 4 7 3
16 3 5 4 6 4 7 3
18 3 7 2 6 4 3 3
19 4 6 3 7 2 7 3
  3,5 5,83 3,33 6 3,5 6  
Cluster centroid untuk Cluster 1
Cluster centroid untuk Cluster 2
Cluster centroid untuk Cluster 2
22
Interpretasi and Profil dari Cluster
Kita lihat dari Tabel Cluster Centroid Pada
Cluster 1 V1(shopping is fun), dan V3 (I combine
shopping with eating out) nilainya relatif
tinggi, sehingga cluster ini dapat diberi nama
fun-loving and concerned shoppers Pada Cluster
2 V5(I dont care about shopping) nilainya
relatif tinggi, sehingga cluster ini dapat diberi
nama apathetic shoppers Pada Cluster 3 V2
(Shopping is bad for my budget), V4 (I try to get
the best buys while shopping) , dan V6 (You can
save a lot of money by comparing prices) nilainya
relatif tinggi, sehingga cluster ini dapat diberi
nama economical shoppers
23
Menaksir Reliabilitas dan Validitas
Prosedur formal untuk menilai reliabilitas dan
viliditas dari hasil cluster kompleks. Prosedur
berikut cukup memadai untuk mengecek kualitas
hasil cluster 1. Perform cluster analysis on
the same data using different distance measure.
Compare the result across measure to determine
the stability of the solutions. 2. Use different
methods of clustering and compare the result. 3.
Split the data randomly in halves. Perform
clustering separetly on each half. Compare
cluster centroids across the two subsamples. 4.
Delete variables randomly. Perform clustering
based on the reduced set of variables. Compare
the result with those obtained by clustering
based on the entire set of variables.
24
Results of Nonhierarchical Clustering
Initial Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 4.0000 6.0000 3.0000
7.0000 2.0000 7.0000 2 2.0000 3.0000 2.0000 4.00
00 7.0000 2.0000 3 7.0000 2.0000 6.0000 4.0000 1.
0000 3.0000
Classification Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 3.8135 5.8992 3.2522
6.4891 2.5149 6.6957 2 1.8507 3.0234 1.8327 3.78
64 6.4436 2.5056 3 6.3558 2.8356 6.1576 3.6736 1.
3047 3.2010
Case Listing of Cluster Membership
Case ID Cluster Distance Case
ID Cluster Distance 1 3 1.780 2 2 2.254 3 3 1.1
74 4 1 1.882 5 2 2.525 6 3 2.340 7 3 1.862 8
3 1.410 9 2 1.843 10 1 2.112 11 2 1.923 12 3 2
.400 13 2 3.382 14 1 1.772 15 3 3.605 16 1 2.1
37 17 3 3.760 18 1 4.421 19 1 0.853 20 2 0.813
25
Final Cluster Centers
Cluster V1 V2 V3 V4 V5 V6 1 3.5000 5.8333 3.3333
6.0000 3.5000 6.0000 2 1.6667 3.0000 1.8333 3.50
00 5.5000 3.3333 3 5.7500 3.6250 6.0000 3.1250 1.
7500 3.8750
Distances between Final Cluster Centers
Cluster 1 2 3 1 0.0000 2 5.5678 0.0000
3 5.7353 6.9944 0.0000
Analysis of Variance
Variable Cluster MS df Error
MS df F
p V1 29.1083 2 0.6078 17
47.8879 .000 V2
13.5458 2 0.6299 17 21.5047
.000 V3 31.3917 2 0.8333 17
37.6700 .000 V4
15.7125 2 0.7279 17 21.5848
.000 V5 24.1500 2 0.7353 17
32.8440 .000 V6
12.1708 2 1.0711 17 11.3632
.001
Number of Cases in each Cluster
Cluster Unweighted Cases Weighted Cases 1
6 6 2 6 6 3 8 8 Missing
0 Total 20
20
Write a Comment
User Comments (0)
About PowerShow.com