Title: Intelligent segmentation algorithms
1Intelligent segmentation algorithms
- Segmentation is to partition a set of items into
segments. - The segments are also called clusters,
segmentation problems are clustering problems - Because segmentation is partitioning,
segmentation problems are partitioning problems
2Segmentation
- Remember from pricing of information products
that knowing the value of the product to the
customer is important. - This value is the basis for versioning, where
there are different versions for different groups
of customers. - Determining groups of customers is called market
segmentation. - Market segmentation is important, not only for
information products and services, but for almost
everybody.
3Modelling segmentation
- Let V e1,e2,,en be a set and let d(ei,ej)
I,j, in 1...n be a distance function on the
elements of V. (We use dij as shorthand for
d(ei,ej)). - Distance function d is such that points which are
close have a negative distance, en points
which are further away from eachother have a
positive distance. - For instance this is possible by introducing a
constant B and letting d be a euclidena
distance minus a constant B.
4Modelling segmentation
- The density D(U) of a subset U van V is is
defined as D(U) ? ei,ej in U dij. - Segmentation is the problem of determining
subsets V1,V2,,Vm of V, such that Vk n Vl Ø
for all k,l in 1,,m, and - ?i,j dij 2 ?k1m D(Vk) is maximized.
5Graphical explanation
dij
Length B
-dij
6Variants of this model
- Distance function contains only nonnegative
values - Number of subsets is an input parameter
- Number of subsets is a constant
Other models and variants are possible
7Complexity?
- MAX CUT is a special case of segmentation in 2
clusters, where all distances are nonnegative. - The decision version of MAX CUT is NP-Complete.
8What are the consequences for other variants?
- Segmenting in k subsets, where k is an input
parameter is NP-Complete (since k 2 is a
possible input) - What about the general case? Exercise!!!
9Algorithms for NP hard optimization problems
10Algorithms for clustering problems
- Neurale netwerken (Bijv. Kohonen)
- Andere AI Heuristieken
- Local search
- Tabu search
- Simulated Annealing
- Genetische algoritmen
- Mathematical programming
11Question
- Which problem is solved by the AI heuristics that
you have learned.Exercise!!!!
12Local Search
- Principles
- A combinatorial problem is defined on a finite
set S of solutions s , each of which have a
solution value v(s). A maximization problem can
now be defined as finding a solution which
maximes v(s) over all elements of S. - Local search is based on a neighboorhood
structure on S. - A neighborhood structure defines a relation on
the elements of S, where some s,t in S are
neighbors and other are not.
13Local search example Max Cut
Visible arcs have distance 1, others have
distance 0. Question. Partition the vertex set
V in two disjoint subsets U and V\U so as to
maximize ?i,jvi in U, vj in V\U dij .
14Example (cont.)
- Each subset U (together with V\U) is a solution,
whos solution value equals the number of arcs in
the cut. - Thus, the set of all subsets of V is the solution
space. - We call to elements of the solution space
neighbors if they have the same cardinality, say
k, and their intersection has cardinality k-1.
15Other neighborhood structures
- Vertices U and U are neighbors if UU1 and
U\ U1. - Both of the previous two relationships
- Vertices U and U are neighbors if UU1 and
U\ U2. - Both of the previous two.
16Optimality of Local Search
- Is the graph defined by taking the solution space
as vertex set and all neighborhood relations as
arc set necessarily connected? - Finding good neighborhood structures is not
trivial
17Local search
- Start with a random solution, or a cleverly
constructed solution, and make this the current
solution - Selection criterium Select a neighbor (for
instance the one with maximum value) Acceptance
criterium Make the neighbor, the current
solution if the acceptance criterium is
satisfied. - Termination criterium. Stop after a maximum
number of iterations, a certain limit, or if the
improvement of the best solution is too slow
18Acceptance criteria
- Accept every improvement
- Accept any change
- Tabu search Solutions (or solution structures)
which were recently the current solution cannot
become current solution There is a dynamic tabu
list, and solutions on the tabu list cannot be
selected. Improvement is not required. - Tabu search prevents cycling, and doesnt get
stuck in local maximums.
19Acceptance criterium
- Simulated annealing Accept a solution with
probability p, where p depends on the improvement
in the objective function.
20Simulated annealing
- Select a neighbor at random
- Accept all improvements..
- Accept non improvements with Prob(V)exp(-V/kT)
- where V is the difference in solution value, k a
constant, and T the temperature - Start with a high temperature and slowly decrease
it to zero.
21Simulated annealing examples
- k1, V1, T4.
- gt e-1/4 1/e1/4 0.75
- k1, V1, T2.
- gt e-1/2 1/e1/2 0.6
- k1, V1, T1.
- gt e-1/1 1/e1/1 0.35
22Properties of simulated annealing
- Probability of decreasing value of current
solution decreases - At temperatur zero, plain local search.
- If cooling is slow enough (which means very slow)
it finds an optimal solution with probability 1.
23Lin Kernighan
- Lin Kernighan Choose 1 element from U and
exchange it with the best possible element from
V\U (maximizing solution value) - Make U steps (or V\U) in which every element
is allowed to be selected only once. - Choose the best solution encountered in these n
steps and repeat.
24Genetic algorithms
- Consider the set S of all solutions as a
population, where every solution is interpreted
to be an indivudual characterized by its genetic
material. The genetic material is encoded as a
string.
25Max Cut Modelled for a Genetic Algorithm
- Any solution, specified as U and V\U is
represented by a string of n elements, where
element i, (i1n) 1 if vi in U, and zero
otherwise.
26Principles of genetic algorithms
- Generations of populations are created taking the
current generation as parents and the next one as
offspring - selection specifies how parents are chosen
- cross over specifies how the initial genetic
material of an offspring is constructed from the
genetic material of the parents - mutation improvement of the genetic material of
an offspring after cross over.
27Example Max Cut
- Start with 100 random solutions
- Select 100 times 2 solutions, where the
probability of being selected depends on the
solution value. - Cross over combine the strings using the bit
wise AND operator. - Mutate For i1...n, flip the bit, if this
improves the solution value. - Repeat until 60 populations are generated, or the
maximum solution value hasnt increased for 5
generations.
28Exercises
- Give a mathematical programmering formulation for
Max Cut - Give a mathematical programmering formulation for
Min Cut - Give a mathematical programmering formulation for
segmentation in k subsets, where k is an input
parameter. - Give a mathematical programmering formulation for
segmentation (where determining k is part of the
problem)