Poster Template

About This Presentation

Title:

Poster Template

Description:

DATA CLUSTERING WITH KERNAL K-MEANS++ Matt Strautmann, Dept. of Electrical and Computer Engineering Dr. Donald C. Wunsch II, Dept. of Electrical and – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 2

Provided by: KKrishn

Category:

more less

Transcript and Presenter's Notes

Title: Poster Template

1
DATA CLUSTERING WITH KERNAL K-MEANS
Matt Strautmann, Dept. of Electrical and Computer
Engineering
Dr. Donald C. Wunsch II, Dept. of Electrical and
Computer Engineering

PROJECT OBJECTIVES
PROJECT GOAL
Experimentally demonstrate the application of
Kernel K-Means to non-linearly clusterable data
sets
ACADEMIC IMPORTANCE
Expand the application of the Kernel K-Means
clustering algorithm to non-traditional uses

PROJECT DATASETS

DISCUSSION
Kernel K-Means was found to cluster the test
datasets in a superior manner over Soft K-Means
Kernel data-mapping was seen to solve the
overlapping data sets by
Mapping the data before clustering to a
higher-dimensional feature space using a
nonlinear function
Partitioning the points with linear separators in
the new space
Soft K-Means could not successfully cluster the
Lung Cancer Dataset results were for one cluster
out of three successfully clustered
Soft K-Means clustered the two dimension, two
cluster Gaussian dataset with only one error out
of the one thousand data points

Iris Plant Dataset
eleves.ens.fr

BACKGROUND
WHAT IS K-MEANS CLUSTERING?
K-Means clustering aims to divide the dataset
into clusters (groups) in which each data point
belongs to the cluster with the nearest mean
vector.
WHAT IS KERNAL K-MEANS?
Sum-of-squares algorithm
Two step process data point assignment and
update
WHAT IS THE PLUS PLUS INITIALIZATION SCHEME?
The first mean vector is a randomly selected data
point
Each subsequent mean vector is created by
evaluating randomly selected data points against
a vector weighting probability

2 Dimension, 2 Cluster Dataset (Gaussian 2D2K)
2 Dimension, 2 Cluster Dataset (Gaussian 2D2K)
lans.ece.utexas.edu
lans.ece.utexas.edu

CONCLUDING REMARKS
The initialization was seen to be the most
important factor in the algorithm converging
The PLUS PLUS cluster mean initialization was
seen to improve the results
Kernel assignment works better than the maximum
responsibility calculation of Soft K-Means
Kernel K-Means can handle small or large
dimension datasets well the increase of
dimensionally seemed to be advantageous for the
Lung Cancer Dataset (56 dimensions) over the
lower clustering accuracy of the Iris Plant
Dataset (4 dimensions)
Kernel K-Means produced superior results to
Soft K-Means when clustering the Lung Cancer
Dataset and demonstrated recognition of all three
clusters

SOFT K-MEANS VS. KERNEL K-MEANS

Soft K-Means Clustering Accuracy Average (over ten runs) Standard Deviation of Accuracy Calculation (over ten runs) Variance of Accuracy Calculation (over ten runs)
Iris Plant Dataset 28.00 8.218 2.867
Lung Cancer Dataset 43.75 - -
2D2K Gaussian Dataset 99.00 - -
8D5K Gaussian Dataset 58.50 2.082 0.043
http//en.wikipedia.org/wiki/K-means_clustering
http//en.wikipedia.org/wiki/K-means_clustering
Kernel K-Means Clustering Accuracy Average (over ten runs) Standard Deviation of Accuracy Calculation (over ten runs) Variance of Accuracy Calculation (over ten runs)
Iris Plant Dataset 57.00 5.009 2.238
Lung Cancer Dataset 62.00 6.878 0.473
2D2K Gaussian Dataset 96.50 1.677 0.028
8D5K Gaussian Dataset 76.31 10.366 1.075
2.) Voronoi Diagram Generated by the Means
(data points associated with nearest cluster
mean)
1.) Initial Mean Orientations

FUTURE WORK
Further improvement of the mean vector
initialization is believed possible over the
PLUS PLUS initialization
Other options for the mean-squared error
calculation for data point evaluation are
possible
The time analysis of the algorithm must be
calculate
The author would like to acknowledge the
expertise of Dr. Rui Xu in advising this project.

RESULTS COMPARISON
Kernel K-Means clustering accuracy superior in
all cases except the two dimensional, two cluster
dataset.
The clustering accuracy of the datasets
increased by the following amounts
Iris Plant 104
Lung Cancer 38
2D2k -2.5
8D5K 30

http//en.wikipedia.org/wiki/K-means_clustering
http//en.wikipedia.org/wiki/K-means_clustering
4.) Step 2 and 3 Repeated until Convergence
3.) Cluster Centroid Becomes New Cluster Mean

APPROACH
Evaluate standard K-Means (Soft) against 4
datasets to form benchmark
Hybridize Soft K-Means with Kernel K-Means to
form Kernel K-Means
Test Kernel K-Means on small size, small
dimension Gaussian, large dimension Gaussian, and
large size datasets

Acknowledgements

Write a Comment

User Comments (0)