Title: A MixedInteger Programming Approach to Customer Segmentation Problem
1A Mixed-Integer Programming Approach to
Customer Segmentation Problem
- Burcu Saglam, F.Sibel Salman, Metin Türkay
- bsaglam,ssalman,mturkay_at_ku.edu.tr
- Dept. of Industrial Engineering
- Serpil Sayin
- ssayin_at_ku.edu.tr
- Dept. of Business Administration
- June 20, 2004
- ESI 2004, METU, Ankara
2Koç University, Istanbul
www.ku.edu.tr www.eng.ku.edu.tr
3Outline
- Introduction
- Clustering Problem
- Clustering Approaches
- Motivation of the Study
- Proposed Model
- Illustrative Example
- Evaluation in Real World Scenario
- Conclusions and Future Work
4Introduction
- This study presents one new mathematical
programming-based segmentation model that is
applied to a digital platform companys customer
database
5Digiturk
- Private digital platform
- Eager to find out opportunities in customer
relationship management, such as onetoone
marketing
6Digiturk
- Pay-Per-View Services
- Vision halls
- Football matches
- Erotic channels
- Interactive Events
- Banking
- TV Games, etc...
- Products
- Standard package
- Sports package
- Cinema package
- Super package
- Mega package
7Why Data Mining, Segmentation?
- Analysis of large data collections, huge
databases - Understanding needs, desires, and expectations of
the customers - Grouping the ongoing and potential customers
- Hidden patterns and knowledge within the data
- Segmentation is applied when there is a need to
partition the instances into natural groups
8Clustering Analysis
- A data mining technique developed for the purpose
of identifying groups of entities that are
similar to each other with respect to certain
characteristics - Dividing heterogeneous sets of data into smaller
and homogeneous ones - Evaluating the result and performance of a
supervised learning model - Analyzing the set of input attributes
- Determining outliers
9Clustering Problem
- Given a data set with n data items in
m-dimensions - Partition the data into k clusters
- In an optimization setting, an objective function
can be defined such as the minimization of the
sum of 1-norm distances between each data point
and the center of the cluster which it belongs to
(Bradley et al., 1997)
10Main Considerations
- The term similarity
- Exclusive, overlapping, probabilistic or fuzzy
clusters - Iterative or non-iterative
- Hierarchical or non-hierarchical
- Distance-based, probability-based approaches,
graph theoretic methods, continuous-discrete
optimization, ...
11Analytical Clustering Methods
- Hierarchical - number of clusters is not assumed
to be known a priori - Divisive and agglomerative methods
- Once an assigment is made, it is irrevocable
- Well known BIRCH algorithm
12Analytical Clustering Methods
- Nonhierarchical - number of clusters is known a
priori - Initially data is divided into k partitions where
each partition represents a cluster - Two main decisions
- Selection of the initial cluster centroids
- Assignment of the instances to clusters
- Sensitive to initial partitions
- Too many local minima
- K-Means, K-Medoids, CLARANS, etc. ...
13Classical K-Means
- Iterative distance-based
- Works in numeric domains
- Partitions instances into disjoint clusters
- Two steps
- Assignment
- Updating the cluster centers
- Works well when the candidate clusters are
approximately equal size
14Shortcomings of K-Means
- Solution is local minima
- Convergence to a local optima is proved
- Sensitivity to initially selected cluster centers
- Worst case time complexity is stated to be
exponential - To find a global minima, the algorithm has to be
repeated several times - Impossible to interpret which attributes are
significant
15Motivation of the Study
- Considering the limitations of existing
clustering approaches and algorithms, an exact
non-hierarchical distance-based clustering
algorithm is proposed
16Proposed Approach
- Given a data set of n data items in m-dimensions
- Aim is to find the optimum partitioning of the
data set into k exclusive clusters - Objective function Minimization of the maximum
diameter of the generated clusters - Number of clusters is known a priori
17MIP-Max Model
Minimize
s.t.
18MIP-Max
- O(kn) variables and O(kn2) constraints
- Non-hierarchical
- Not iterative
- No need for an initial solution
- Global optimum
19Illustrative Example
20Comparisons with the Results of the K-Means
21Comparisons with the Results of the K-Means
22Evaluation in Real World Scenario
- Data set includes demographic and transactional
information - Each row represents a unique customer
- 18 real-valued and categorical attributes
23Experiments with MIP-Max Model
k 2
CPU times are reported for a computer with a
Pentium IV processor at 2.56 GHz and 1GB memory.
24Comparison of MIP-Max with K-Means and
Interpretations
- The result of the approach is compared with the 3
cluster solution and 100 data items - MIP-Max model grouped 39 instances in the first
cluster, 34 instances in the second cluster and
27 instances in the third cluster - K-Means generated clusters 1, 2 and 3
respectively with 59, 22, and 19 instances - Interpretations based on predictiveness score
25Predictiveness Score
- Given class C and attribute A with values v1,
v2, v3vn, an attribute-value predictiveness
score for vi is defined as the probability an
instance resides in C given the instance has
value vi for A. - Between-class measure
- Actually for categorical attibutes, most of our
attributes are in nominal form - An attribute has a distinguishing power in one
cluster if its predictiveness scores are higher
than 75
26Predictiveness Score
27Conclusions and Future work
- The sensitivity of the K-Means to initial
solution is analyzed - The interpretations of MIP-Max model is more
meaningful than K-Means, sports package
subscriber grouped in one group, etc. ... - MIP-Max is significantly better than K-Means in
terms of quality and stability of the solutions - Future Work
- Improvement of run times
- Determination of number of clusters
28Thanks ? Welcome to Any Questions ?
29Clustering Approaches
- Hierarchical and non-hierarchical clustering
methods - Classical K-Means
- Cobweb
- Clarans
- Birch
- Advantages and disadvantages
- Motivation of the study
30COBWEB
- Conceptual clustering technique
- Forms a hierarchy to capture knowledge
- Deals only with categorical (nominal) data
- Cluster quality namely Category Utility
- Expensive to compute this measurement
- Instance ordering have impact on the resulting
clustering.
31CLARANS
- A type of K-Medoids algorithm, differs by its
randomized partial search strategy, - Clustering problem is represented by graph,
- Limitations
- Convergence to a definite local minimum,
- Efficiency considerations.
32BIRCH
- Stemmed from the fact that available memory is
limited, - One iteration ends with a successful clustering,
- Applicable to large data sets,
- But, sensitive to parameters settings.
33Experimental Results of MIP-Max
34Comparisons with the Results of the K-Means