Introduction to Reinforcement Learning

1 / 12
About This Presentation
Title:

Introduction to Reinforcement Learning

Description:

Neural Networks are supervised learning algorithms: for each input, we know the output. ... percept. reward. Rewards to specify goals (example: dogs) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 13
Provided by: stu56

less

Transcript and Presenter's Notes

Title: Introduction to Reinforcement Learning


1
Introduction toReinforcement Learning
  • Freek Stulp

2
Overview
  • General principles of RL
  • Markov Decision Process as model
  • Values of states V(s)
  • Values of state-actions Q(a,s)
  • Exploration vs. Exploitation
  • Issues in RL
  • Conclusion

3
General principles of RL
  • Neural Networks are supervised learning
    algorithms for each input, we know the output.
  • What if we dont know the output for each input?
  • Flight control system example
  • Let the agent learn how to achieve certain goals
    itself, through interaction with the environment.

4
General principles of RL
  • Let the agent learn how to achieve certain goals
    itself, through interaction with the environment.
  • This does not solve the problem!

5
Popular model MDPs
  • Markov Decision Process S,A,R,T
  • Set of states S
  • Set of actions A
  • Reward function R
  • Transition function T
  • Markov property
  • Tss only depends on s, s
  • Policy p(S)gtA
  • Problem Find policy p that maximizes the reward
  • Discounted reward r0 g1r1 g2r2 ... gnrn

a0
a1
s0
s1
r0
6
Values of states Vp(s)
  • Definition of value Vp(s)
  • Cumulative reward when starting in state s, and
    executing some policy untill terminal state is
    reached.
  • Optimal policy yields V(s)

7
Determining Vp(s)
  • Dynamic programmingV(s) R(s) S
    Vps(TssV(s))

TD-learningV(s) V(s) a(R(s)V(s)-V(s))
  • Only visited states are used

- Necessary to consider all states.
8
Values of state-action Q(a,s)
  • Q-values Q(a,s) Value of doing an action in a
    certain state.
  • Dynamic Programming Q(a,s) R(s)
    SsTssmaxaQ(a,s)
  • TD-learning
  • Q(a,s) Q(a,s) a(R(s) maxaQ(a,s) -
    Q(a,s)) T is not in this formula Model
    free learning!

9
Exploration vs. Exploitation
  • Only exploitation
  • New (maybe better) paths never discovered
  • Only exploration
  • What is learned is never exploited
  • Good trade-off
  • Explore first to learn, exploit later to benefit

10
Some issues
  • Hidden state
  • If you dont know where you are, you cant know
    what to do.
  • Curse of dimensionality
  • Very large state spaces.
  • Continuous states/action spaces
  • All algorithms use discrete tables spaces. What
    about continuous values?
  • Many of your articles discuss solutions to these
    problems.

11
Conclusion
  • RL Learning through interaction and rewards.
  • Markov Decision Process popular model
  • Values of states V(s)
  • Values of action/states Q(a,s) (model
    free!)
  • Still some problems... not quite ready for
    complex real-world problems yet, but research
    underway!

12
Literature
  • Artificial Intelligence A Modern Approach
  • Stuart Russel and Peter Norvig
  • Machine Learning
  • Tom M. Mitchell
  • Reinforcement learning A Tutorial
  • Mance E. Harmon and Stephanie S. Harmon
Write a Comment
User Comments (0)