Policy Gradient in Continuous Time - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Policy Gradient in Continuous Time

Description:

... algorithm Example Results Introduction of the Problem Consider an optimal control problem with continuous state System dynamics: Control State Objective: ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 20
Provided by: hui4
Category:

less

Transcript and Presenter's Notes

Title: Policy Gradient in Continuous Time


1
Policy Gradient in Continuous Time
by Remi Munos, JMLR 2006
Presented by Hui Li Duke University Machine
Learning Group May 30, 2007
2
Outline
  • Introduction
  • Discretized Stochastic Processes Approximation
  • Model-free Reinforcement Learning (RL)
  • algorithm
  • Example Results

3
Introduction of the Problem
  • Consider an optimal control problem with
    continuous state

System dynamics
  • Deterministic process
  • Continuous state
  • Objective Find an optimal control (ut) that
    maximize the functional

Objective function
4
Introduction of the Problem
  • Consider a class of parameterized policies ??
    with
  • Find parameter ? that maximize the performance
    measure
  • Standard approach is to use gradient ascent
    method

object of the paper
5
Introduction of the Problem
How to compute
  • Finite-difference method

This method requires a large number of
trajectories to compute the gradient of
performance measure.
  • Pathwise estimation of the gradient

Compute the gradient using one trajectory only
6
Introduction of the Problem
Pathwise estimation of the gradient
  • Define
  • Dynamics of zt
  • Gradient

unknown
known
  • In the reinforcement learning, is
    unknown. How to approximate zt?

7
Discretized Stochastic Processes Approximation
  • A General Convergence Result

If
8
  • Discretization of the state
  • Stochastic policy
  • Stochastic discrete state process

Initialization
Jump in state
9
Proof of proposition 5
From Taylors formula
The average jump
Directly apply the Theorem 3, proposition 5 is
proved.
10
  • Discretization of the state gradient
  • Stochastic discrete state gradient process

Initialization
With
11
Proof of proposition 6
Since
then
Directly apply the Theorem 3, proposition 6 is
proved.
12
Model-free Reinforcement Learning Algorithm
Let
In this stochastic approximation,
is observed, and
is given, we only need to approximate
13
Least-Square Approximation of
Define
The set of past discrete times t-c??s? t when
action ut have been taken.
From Taylors formula, for all discrete time s,
We deduce
14
Where
We may derive an approximation of
by solving the least-square problem
Then we have
Here
denote the average value of
15
Algorithm
16
Experimental Results
Six continuous state x0, y0 hand position x, y
mass position vx, vy mass velocity Four
control actionU (1,0), (0,1),
(-1,0),(0,-1) Goal reach a target (xG, yG) with
the mass at specific time T
Terminal reward function
17
The system dynamics
Consider a Boltzmann-like stochastic policy
where
18
(No Transcript)
19
Conclusion
  • Described a reinforcement learning method for
    approximating the gradient of a continuous-time
    deterministic problem with respect to the control
    parameters
  • Used a stochastic policy to approximate the
    continuous system by a consistent stochastic
    discrete process
Write a Comment
User Comments (0)
About PowerShow.com