Policy Gradient in Continuous Time

About This Presentation

Title:

Policy Gradient in Continuous Time

Description:

... algorithm Example Results Introduction of the Problem Consider an optimal control problem with continuous state System dynamics: Control State Objective: ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 20

Provided by: hui4

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Policy Gradient in Continuous Time

1
Policy Gradient in Continuous Time
by Remi Munos, JMLR 2006
Presented by Hui Li Duke University Machine
Learning Group May 30, 2007
2
Outline

Introduction
Discretized Stochastic Processes Approximation
Model-free Reinforcement Learning (RL)
algorithm
Example Results

3
Introduction of the Problem

Consider an optimal control problem with
continuous state

System dynamics

Deterministic process
Continuous state

Objective Find an optimal control (ut) that
maximize the functional

Objective function
4
Introduction of the Problem

Consider a class of parameterized policies ??
with

Find parameter ? that maximize the performance
measure

Standard approach is to use gradient ascent
method

object of the paper
5
Introduction of the Problem
How to compute

Finite-difference method

This method requires a large number of
trajectories to compute the gradient of
performance measure.

Pathwise estimation of the gradient

Compute the gradient using one trajectory only
6
Introduction of the Problem
Pathwise estimation of the gradient

Define

Dynamics of zt

Gradient

unknown
known

In the reinforcement learning, is
unknown. How to approximate zt?

7
Discretized Stochastic Processes Approximation

A General Convergence Result

If
8

Discretization of the state

Stochastic policy

Stochastic discrete state process

Initialization
Jump in state
9
Proof of proposition 5
From Taylors formula
The average jump
Directly apply the Theorem 3, proposition 5 is
proved.
10

Discretization of the state gradient

Stochastic discrete state gradient process

Initialization
With
11
Proof of proposition 6
Since
then
Directly apply the Theorem 3, proposition 6 is
proved.
12
Model-free Reinforcement Learning Algorithm
Let
In this stochastic approximation,
is observed, and
is given, we only need to approximate
13
Least-Square Approximation of
Define
The set of past discrete times t-c??s? t when
action ut have been taken.
From Taylors formula, for all discrete time s,
We deduce
14
Where
We may derive an approximation of
by solving the least-square problem
Then we have
Here
denote the average value of
15
Algorithm
16
Experimental Results
Six continuous state x0, y0 hand position x, y
mass position vx, vy mass velocity Four
control actionU (1,0), (0,1),
(-1,0),(0,-1) Goal reach a target (xG, yG) with
the mass at specific time T
Terminal reward function
17
The system dynamics
Consider a Boltzmann-like stochastic policy
where
18
(No Transcript)
19
Conclusion

Described a reinforcement learning method for
approximating the gradient of a continuous-time
deterministic problem with respect to the control
parameters
Used a stochastic policy to approximate the
continuous system by a consistent stochastic
discrete process

Write a Comment

User Comments (0)