Title: Intelligent Systems
1Intelligent Systems
- Machine learning
- Steven de Jong
- With contributions by K. Tuyls, E. Postma, I.
Sprinkhuizen-Kuyperand Evonet Flying Circus
2This lecture
- Goal
- Quickly revisiting material on machine learning
from courses you already had - Giving a preview of material that will be
discussed in the M.Sc. program - Focussing on design and applications
- Interaction!
3This lecture
- Content
- Why use machine learning? A tutorial!
- Artificial neural networks
- Evolutionary computation (genetic algorithms)
- Reinforcement learning
- Not the content
- Lots of theory I will provide some references
- Two hours of talking lots of slides though ?
41. Machine learning
- Tutorial based on your assignment
5Q What is the most powerful problem solver
in the Universe?
- The (human) brain
that created the wheel, New York, wars
and so on (after Douglas Adams) - The evolution mechanism
that created the human brain (after Darwin et
al.)
6Building problem solvers by looking at and
mimicking
- brains ? neurocomputing
- evolution ? evolutionary computing
7Taxonomy
8Why use machine learning?
- Speed
- Robustness
- Flexibility
- Adaptivity
- Context sensitivity
9ML and the assignment
- Create a robot controller that uses planning
and other techniques to navigate a physical
robot in a maze - So, why opt for machine learning?
10Example 1 recognize crossing
- Sensors
- 300mm
- 1000mm
- 980mm
- 290mm
- 6000mm
- 760mm
- 780mm
- 5600mm
2
1
11Example 1 recognize crossing
- Sensors
- 300mm
- 1000mm
- 980mm
- 290mm
- 6000mm
- 760mm
- 780mm
- 5600mm
- Rule-based
- IF 250lts1lt350 AND
- Lots of work
- Crappy performance!
- Sensors are noisy
- What exactly defines a crossing?
- Is it not a T-joint?
12The machine learning perspective
- What kind of task is this?
- What input data do we have?
- What output data do we want?
- Supervised or unsupervised learning?
- ? Which method is suitable?
13The machine learning perspective
- What kind of task is this?
- Classification
- What input data do we have?
- Eight sonar sensor values
- What output data do we want?
- Probability that values represent crossing,
T-joint, corridor - Supervised or unsupervised learning?
- Probably supervised
14The machine learning perspective
- Supervised learning classification
- Make a large maze, mark areas as being crossings,
T-joints, corridors - Place robots at random locations and in random
orientations - Train your ML method until all locations
correctly classified - Problem classification depends on orientation!
15The machine learning perspective
- Orientation maze is seen with robot!!!
16Example 2 keep your lane
- Robot follows a hallway
- Possibly with angles!
- Problem
- Noise on actuators (motors, 3rd wheel, dust on
the floor) - The worse the robots position, the worse
performance on classification
17Example 2 keep your lane
- Idea
- Monitor distance from the walls
- If the robot is significantly off-center, perform
a correction - Problems
- Distance monitoring
- Some lanes are really short
- What if robot already badly aligned?
18The machine learning perspective
- What kind of task is this?
- Control
- What input data do we have?
- Eight sonar sensor values (and camera)
- What output data do we want?
- A robot that keeps itself aligned
- Supervised or unsupervised learning?
- Probably unsupervised
19The machine learning perspective
- Unsupervised learning control
- Develop a large maze
- Develop tasks move from crossing A to crossing E
(adjacent) - Couple sensory information to motors with some ML
method - Quality
- Short time needed to reach destination
- Low number of collisions with the wall
20Sensory-motor coordination
- Idea
- Enhance information obtained by sensors by
actively using motors - For example, for
- Aligning the robotor
- being more sure about classification
- you might stop forward movement and start
rotating the robot in a scanning fashion
212. Artificial Neural Networks
22Artificial neural networks
- You have seen these often in the past
- I will provide only a quick overview
- Slides will be put online
23Recommended literature
- Russel Norvig H 19 pp. 563-587
- and many more
24A peek into the neural computer
- Is it possible to develop a computer model after
the natural example (the human brain)? - Brain-inspired models
- Models that possess a limited number of
structural and functional properties of the
neural computer
25Neurons, the building blocks of the brain
26Neural activity
out
in
27Synapses, the basis of learning and memory
28Hebbian learning (Donald Hebb)
?w(1,2) ? a(1) a(2)
29(Artificial) Neural Networks
- Neurons
- Activity
- Non-linear transfer function (!)
- Connections
- Adaptive weights
- Learning
- Supervised
- Unsupervised
30Artificial Neurons
- input (vectors)
- summation (excitation)
- output (activation)
i1
a f(e)
i2
e
i3
31Transfer function
- Non-linear function(sigmoid)
a to 0
f(x)
a to infinity
x
32Artificial connections (Synapses)
- wAB
- The weight of the connection from neuron A to
neuron B
33The Perceptron
34Learning in the Perceptron
- Delta learning rule (supervised)
- The difference between the target tand actual
output o, given input x
- Global error E
- Is a function of the differences between the
target and actual output of the patterns to be
learnt
35Gradient descent
36Decision boundaries linear!
37The multilayer perceptron
input
hidden
output
38Learning in the MLP
39Sigmoïd function (logistic)
- Alternative tanh (lt-1,1gt instead of lt0,1gt)
- Derivative f(x) f(x) 1 f(x)
40Updating the hidden-to-output weights
41Updating the input-to-hidden weights
42Forward Backward Propagation
43Implementation
- Use ADT for graphs
- Or just use matrices and vectors
- Vectors for input and output
- Matrices for each transition / layer (wij)
- Learning
- Supervised e.g., Backpropagation
- Unsupervised e.g., Evolutionary Algorithms
44Break
453. (a) Introduction toEvolutionary Computation
- (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)
46Recommended literature
- Evonet site
- http//www.dcs.napier.ac.uk/evonet
- Slides, demos
- T. M. Mitchell, Machine Learning, 1997
- http//www.cs.cmu.edu/tom/ (slides ch. 9)
- Other literature
- Goldberg (1989)
- Michalewicz (1996)
- Bäck (1996)
47History
- L. Fogel 1962 (San Diego, CA) Evolutionary
Programming - J. Holland 1962 (Ann Arbor, MI)Genetic
Algorithms - I. Rechenberg H.-P. Schwefel 1965 (Berlin,
Germany) Evolution Strategies - J. Koza 1989 (Palo Alto, CA)Genetic Programming
48The Metaphor
- EVOLUTION
- Individual
- Fitness
- Environment
- PROBLEM SOLVING
- Candidate Solution
- Quality
- Problem
49The Ingredients
t 1
t
reproduction
selection
50The Evolution Mechanism
- Increasing diversity by genetic operators
- Mutation local search
- Recombination(crossover)global search
- Decreasing diversity by selection
- Of parents
- Of survivors
51The Evolutionary Cycle
Selection
Recombination
Mutation
Replacement
52Main Streams
- Genetic Algorithms
- Evolution Strategies
- Evolutionary Programming
- Genetic Programming
53Domains of Application
- Numerical, Combinatorial Optimisation
- System Modeling and Identification
- Planning and Control
- Engineering Design
- Data Mining
- Machine Learning
- Artificial Life
- Evolving neural networks
54Performance
- Acceptable performance at acceptable costs on a
wide range of problems - Intrinsic parallelism (robustness, fault
tolerance) - Superior to other techniques on complex problems
with - lots of data, many free parameters
- complex relationships between parameters
- many (local) optima
55Advantages
- No presumptions w.r.t. problem space
- Widely applicable
- Low development application costs
- Easy to incorporate other methods
- Solutions are interpretable (unlike NN)
- Can be run interactively, accommodate user
proposed solutions - Provide many alternative solutions
56Disadvantages
- No guarantee for optimal solution within finite
time - Weak theoretical basis
- May need parameter tuning
- Often computationally expensive, i.e. slow
573. (b) How to Build an Evolutionary Algorithm
- (Ida Sprinkhuizen-Kuyper and EvoNet Flying Circus)
58Evolutionary algorithms
- Evolutionary algorithms quick implementation
guide - Evolving artificial neural networks
59- GA(fitness, threshold, p, c, m)
- fitness is a function calculating fitness of an
individual in the gene pool - threshold either fitness to reach or number of
generations - p is population size
- c is crossover probability 0,1
- m is mutation probability 0,1
- Initialize P ? p random individuals
- Evaluate for each i in P, compute fitness(i)
- While maxi fitness(i) lt threshold or
generation lt treshold - Select probabilistically select (1-c)p
individuals out of P to add to Ps - Crossover Probabilistically select c/2p pairs
of individuals from P. For each pair, lti1, i2gt,
produce two offspring by applying the crossover
operator. Add the offspring to Ps too. - Mutate Apply mutation operator to mp random
members of Ps - Update P ? Ps
- Evaluate for each h in P, compute Fitness(h)
- Shift generation generation ? generation 1
- Return the individual from P that has the highest
fitness
60The Steps
- In order to build an evolutionary algorithm
there are a number of steps that we have to
perform - Design a representation
- Decide how to initialise a population
- Design a way of mapping a genotype to a phenotype
- Design a way of evaluating an individual
61Further Steps
- Design suitable mutation operator(s)
- Design suitable recombination operator(s)
- Decide how to manage our population
- Decide how to select individuals to be parents
- Decide how to select individuals to be replaced
- Decide when to stop the algorithm
62Designing a Representation
- We have to come up with a method of representing
an individual as a genotype. - There are many ways to do this and the way we
choose must be relevant to the problem that we
are solving. - When choosing a representation, we have to bear
in mind how the genotypes will be evaluated and
what the genetic operators might be.
63Example Discrete Representation
- Representation of an individual can be using
discrete values (binary, integer, or any other
system with a discrete set of values). - Following is an example of binary representation.
CHROMOSOME
GENE
64Example Discrete Representation
Phenotype
65Example Discrete Representation
- Phenotype could be integer numbers
Genotype
Phenotype
163
127 026 125 024 023 022 121
120 128 32 2 1 163
66Example Discrete Representation
- Phenotype could be Real Numbers
- e.g. a number between 2.5 and 20.5 using 8 binary
digits
Genotype
Phenotype
13.9609
67Example Real-valued representation
- A very natural encoding if the solution we are
looking for is a list of real-valued numbers,
then encode it as a list of real-valued numbers!
(i.e., not as a string of 1s and 0s) - Lots of applications, e.g. parameter optimisation
(ANNs!)
68Example Real-valued representation
- Individuals are represented as a tuple of n
real-valued numbers - The fitness function maps tuples of real numbers
to a single real number
69Phenotype to Genotype
- Sometimes producing the phenotype from the
genotype is a simple and obvious process. - Other times the genotype might be a set of
parameters to some algorithm, which works on the
problem data to produce the phenotype
Genotype
Problem Data
Growth Function
Phenotype
70Evaluating an Individual
- This is by far the most costly step for real
applications - do not re-evaluate unmodified individuals
- It might be a subroutine, a black-box simulator,
or any external process - (e.g. robot experiment)
- You could use approximate fitness - but not for
too long
71More on Evaluation
- Constraint handling - what if the phenotype
breaks some constraint of the problem - penalize the fitness
- specific evolutionary methods
- Multi-objective evolutionary optimization
gives a set of compromise solutions
72Mutation Operators
- We might have one or more mutation operators for
our representation. - Some important points are
- At least one mutation operator should allow every
part of the search space to be reached - The size of mutation is important and should be
controllable - Mutation should produce valid chromosomes
73Example
1 1 1 1 1 1 1
before
mutated gene
Mutation usually happens with probability pm for
each gene
74Example Real-valued mutation
- Perturb values by adding some random noise
- Often, a Gaussian/normal distribution N(0,?) is
used, where - 0 is the mean value
- ? is the standard deviation
- and
- xi xi N(0,?i)
- for each parameter
75Recombination (crossover)
- We might have one or more recombination
operators for our representation. - Some important points are
- The child should inherit something from each
parent. If this is not the case then the operator
is a mutation operator. - The recombination operator should be designed in
conjunction with the representation so that
recombination is not always catastrophic - Recombination should produce valid chromosomes.
76Example Recombination for Discrete Representation
Whole Population
Each chromosome is cut into n pieces which are
recombined. (Example for n1)
offspring
77Example Recombination for real valued
representation
Discrete recombination (uniform crossover) given
two parents one child is created as follows
78Example Recombination for real valued
representation
Intermediate recombination (arithmetic
crossover) given two parents one child is
created as follows
?
79Selection Strategy
- We want to have some way to ensure that better
individuals have a better chance of being
parents than less good individuals. - This will give us selection pressure which will
drive the population forward. - We have to be careful to give less good
individuals at least some chance of being parents
- they may include some useful genetic material.
80Example Fitness proportionate selection
- Expected number of times fi is selected for
mating is
- Better (fitter) individuals have
- more space
- more chances to be selected
Best
Worst
81Example Fitness proportionate selection
- Disadvantages
- Danger of premature convergence because
outstanding individuals take over the entire
population very quickly - Low selection pressure when fitness values are
near each other - Behaves differently on transposed versions of the
same function
82Example Fitness proportionate selection
- Fitness scaling A cure for FPS
- Start with the raw fitness function f.
- Standardise to ensure
- Lower fitness is better fitness.
- Optimal fitness equals to 0.
- Adjust to ensure
- Fitness ranges from 0 to 1.
- Normalise to ensure
- The sum of the fitness values equals to 1.
83Example Tournament selection
- Select k random individuals, without replacement
- Take the best
- k is called the size of the tournament
84Example Ranked based selection
- Individuals are sorted on their fitness value
from best to worse. The place in this sorted list
is called rank. - Instead of using the fitness value of an
individual, the rank is used by a function to
select individuals from this sorted list. The
function is biased towards individuals with a
high rank ( good fitness).
85Replacement Strategy
- The selection pressure is also affected by the
way in which we decide which members of the
population to kill in order to make way for our
new individuals. - We can use the stochastic selection methods in
reverse, or there are some deterministic
replacement strategies. - We can decide never to replace the best in the
population elitism.
86Recombination vs Mutation
- Recombination
- modifications depend on the whole population
- decreasing effects with convergence
- exploitation operator
- Mutation
- mandatory to escape local optima
- strong causality principle
- exploration operator
87Stopping criterion
- The optimum is reached!
- Limit on CPU resources
Maximum number of fitness
evaluations - Limit on the users patience
After some generations without
improvement
88Algorithm performance
- Never draw any conclusion from a single run
- use statistical measures (averages, medians)
- from a sufficient number of independent runs
- From the application point of view
- design perspective
- find a very good solution at least once
- production perspective
- find a good solution at almost every run
89Algorithm Performance (2)
- Remember the WYTIWYG principal
- What you test is what you get - dont tune
algorithm performance on toy data and expect it
to work with real data.
90Key issues
- Genetic diversity
- differences of genetic characteristics in the
population - loss of genetic diversity all individuals in
the population look alike - snowball effect
- convergence to the nearest local optimum
- in practice, it is irreversible
91Key issues (2)
- Exploration vs Exploitation
- Exploration sample unknown regions
- Too much exploration random search, no
convergence - Exploitation try to improve the best-so-far
individuals - Too much expoitation local search only
convergence to a local optimum
924. Reinforcement learning
- (I. Sprinkhuizen-Kuyper, K. Tuyls)
93Recommended literature
- Sutton, R.S. and A.G. Barto (1998), Reinforcement
Learning An Introduction, MIT Press.
http//www.cs.ualberta.ca/sutton/book/the-book.ht
ml - Mitchell, T.(1997). Machine Learning. McGraw
Hill. - RL repository at MSU (http//web.cps.msu.edu/rlr)
94Reinforcement Learning
- Roots of reinforcement learning (RL)
- Preliminaries (need to know!)
- The setting
- Properties
- The Markov Property
- Markov Decision Processes (MDP)
95Roots of Reinforcement Learning
- Origins from
- Mathematical psychology (early 10s)
- Control theory (early 50s)
- Mathematical psychology
- Edward Thorndike research on animals via puzzle
boxes - Bush Mosteller developed one of the first
models of learning behavior - Control theory
- Richard Bellman Stability theory of Differential
Equations How to design an optimal controller? - Inventor Dynamic Programming solving optimal
control problems by solving the Bellman equations!
96Preliminaries Setting of Reinforcement Learning
- What is it?
- Learning from interaction
- Learning about, from, and while interacting with
an external environment - Learning what to dohow to map situations to
actionsso as to maximize a numerical reward
signal
97Preliminaries Setting of Reinforcement Learning
- Key features?
- Learner is not told which actions to take
- Trial-and-Error search
- Possibility of delayed reward
- Sacrifice short-term gains for greater long-term
gains - The need to explore and exploit
- Considers the whole problem of a goal-directed
agent interacting with an uncertain environment
98Preliminaries properties of RLSupervised versus
Unsupervised
- Supervised learning Unsupervised learning
Training Info desired (target) outputs
Training Info evaluations (rewards /
penalties)
SupervisedLearning System
ReinforcementLearning System
Inputs
Outputs
Inputs
Outputs
actions
states
Error (target output actual output)
Objective get as much reward as possible
99 Preliminaries properties of RL The
Agent-Environment Interface
100Preliminaries properties of RL Learning how to
behave
- Reinforcement learning methods specify how the
agent changes its policy as a result of
experience. - Roughly, the agents goal is to get as much
reward as it can over the long run.
101Preliminaries properties of RL Abstraction
- Getting the Degree of Abstraction Right
- Time steps need not refer to fixed intervals of
real time. - Actions can be low level (voltages to motors), or
high level (accept job offer), mental (shift
focus of attention), etc. - States can be low-level sensations, or
abstract, symbolic, based on memory, or
subjective (surprised or lost). - The environment is not necessarily unknown to the
agent, only incompletely controllable.
102Preliminaries Properties of RL Goals and Rewards
- Is a scalar reward signal an adequate notion of a
goal?maybe not, but it is surprisingly flexible. - A goal should specify what we want to achieve,
not how we want to achieve it. - A goal must be outside the agents direct
controlthus outside the agent. - The agent must be able to measure success
- explicitly
- frequently during its lifespan.
103Preliminaries Properties of RL Whats the
objective?
Episodic tasks interaction breaks naturally into
episodes, e.g., plays of a game, trips through a
maze.
Immediate reward
Long term reward
104Preliminaries Properties of RL Returns for
Continuing Tasks
Continuing tasks interaction does not have
natural episodes.
Discounted return
105An Example
Avoid failure the pole falling beyond a critical
angle or the cart hitting end of track.
As an episodic task where episode ends upon
failure
As a continuing task with discounted return
In either case, return is maximized by avoiding
failure for as long as possible.
106Another Example
Get to the top of the hill as quickly as
possible.
Return is maximized by minimizing number of
steps reach the top of the hill.
107Preliminaries properties of RL A Unified
Notation
- Think of each episode as ending in an absorbing
state that always produces reward of zero - We can cover all cases by writing
108The Markov Property
- The state at step t, means whatever information
is available to the agent at step t about its
environment. - The state can include immediate sensations,
highly processed sensations, and structures built
up over time from sequences of sensations. - A state should summarize past sensations so as to
retain all essential information, i.e., it
should have the Markov Property
109Markov Decision Processes
- If a reinforcement learning task has the Markov
Property, it is basically a Markov Decision
Process (MDP). - If state and action sets are finite, it is a
finite MDP. - To define a finite MDP, you need to give
- state and action sets
- one-step dynamics defined by transition
probabilities - reward probabilities
110An Example Finite MDP
- At each step, robot has to decide whether it
should (1) actively search for a can, (2) wait
for someone to bring it a can, or (3) go to home
base and recharge. - Searching is better but runs down the battery if
runs out of power while searching, has to be
rescued (which is bad). - Decisions made on basis of current energy level
high, low. - Reward number of cans collected
111Recycling Robot MDP
112Reinforcement procedure
- Bellman equations
- Policy evaluation and improvement
- Policy iteration (value functions)
113Reinforcement methods
- Dynamic programming
- Monte Carlo methods
- Temporal Difference (TD) learning
114Wrapping up