Reinforcement Learning

About This Presentation

Title:

Reinforcement Learning

Description:

Reinforcement learning: ... Reinforcement Learning. Task ... gam: discount factor, closer to 1 more weight is given to future reinforcements. ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 23

Provided by: johane3

more less

Transcript and Presenter's Notes

Title: Reinforcement Learning

1
KI2 - 11
Reinforcement Learning
Johan Everts
Kunstmatige Intelligentie / RuG
2
What is Learning ?

Learning takes place as a result of interaction
between an agent and the world, the idea behind
learning is that
Percepts received by an agent should be used
not only for acting, but also for improving the
agents ability to behave optimally in the future
to achieve its goal.

3
Learning Types

Supervised learning
Situation in which sample (input, output)
pairs
of the function to be learned can be perceived
or are given
Reinforcement learning
Where the agent acts on its environment, it
receives some evaluation of its action
(reinforcement), but is not told of which action
is the correct one to achieve its goal
Unsupervised Learning
No information at all about given output

4
Reinforcement Learning

Task
Learn how to behave successfully to achieve a
goal while interacting with an external
environment
Learn through experience
Examples
Game playing The agent knows it has won or lost,
but it doesnt know the appropriate action in
each state
Control a traffic system can measure the delay
of cars, but not know how to decrease it.

5
Elements of RL
Agent
Policy
Environment

Transition model, how action influence states
Reward R, imediate value of state-action
transition
Policy ?, maps states to actions

6
Elements of RL
7
Elements of RL

Value function maps states to state values
Discount factor ? ? 0, 1) (here 0.9)

V(state) values
8
RL task (restated)

Execute actions in environment,
observe results.
Learn action policy ? state ? action that
maximizes expected discounted reward
E r(t) ?r(t 1) ?2r(t 2)
from any starting state in S

9
Reinforcement Learning

Target function is ? state ? action
RL differs from other function approximation
tasks
Partially observable states
Exploration vs. Exploitation
Delayed reward -gt temporal credit assignment

10
Reinforcement Learning

Target function is ? state ? action
However
We have no training examples of form ltstate,
actiongt
Training examples are of form
ltltstate, actiongt, rewardgt

11
Utility-based agents

Try to learn V ? (abbreviated V)
perform lookahead search to choose best action
from any state s
Works well if agent knows
? state ? action ? state
r state ? action ? R
When agent doesnt know ? and r, cannot choose
actions this way

12
Q-learning

Q-learning
Define new function very similar to V
If agent learns Q, it can choose optimal action
even without knowing ? or R
Using Learned Q

13
Learning the Q-value

Note Q and V closely related
Allows us to write Q recursively as

14
Learning the Q-value

FOR each lts, agt DO
Initialize table entry
Observe current state s
WHILE (true) DO
Select action a and execute it
Receive immediate reward r
Observe new state s
Update table entry for as follows
Move record transition from s to s

15
Q-learning

Q-learning, learns the expected utility of taking
a particular action a in a particular state s
(Q-value of the pair (s,a))

r(state, action) immediate reward values
Q(state, action) values
V(state) values
16
Q-learning

Demonstration
http//iridia.ulb.ac.be/fvandenb/qlearning/qlear
ning.html
eps probability to use a random action instead
of the optimal policy
gam discount factor, closer to 1 more weight is
given to future reinforcements.
alpha learning rate

17
Temporal Difference Learning

Q-learning estimates one time step difference
Why not for n steps?

18
Temporal Difference Learning

TD(?) formula
Intuitive idea use constant 0 ? ? ? 1 to combine
estimates from various lookahead distances (note
normalization factor (1- ?))

19
Genetic algorithms

Imagine the individuals as agent functions
Fitness function as performance measure or reward
function
No attempt made to learn the relationship between
the rewards and actions taken by an agent
Simply searches directly in the individual space
to find one that maximizes the fitness functions

20
Genetic algorithms

Represent an individual as a binary string
Selection works like this if individual X scores
twice as high as Y on the fitness function, then
X is twice as likely to be selected for
reproduction than Y.
Reproduction is accomplished by cross-over and
mutation

21
Cart Pole balancing

Demonstration
http//www.bovine.net/jlawson/hmc/pole/sane.html

22
Summary

RL addresses the problem of learning control
strategies for autonomous agents
In Q-learning an evaluation function over states
and actions is learned
TD-algorithms learn by iteratively reducing the
differences between the estimates produced by the
agent at different times
In the genetic approach, the relation between
rewards and actions is not learned. You simply
search the fitness function space.

Write a Comment

User Comments (0)