Title: Statistical learning and optimal control:
1Statistical learning and optimal control A
framework for biological learning and motor
control Lecture 4 Stochastic optimal
control Reza Shadmehr Johns Hopkins School of
Medicine
2Summary Optimal control of a linear system with
quadratic cost
3- Issues with the control policy
- What if the system gets perturbed during the
control policy? With the current approach, there
is no compensation for the perturbation. - In reality, both the state update equation and
the measurement equation are subject to noise.
How do we take that into account? - To resolve this, we need a way to figure out
what command to produce, given that we find
ourselves at some state x at some time k. Once
we figure this out, we will consider the
situation where we cannot measure x directly, but
have noise to deal with. Our best estimate will
be through the Kalman filter. This will link
estimation with control.
Starting at state
Sequence of actions
Observations
Cost to minimize
4Note that at the last time step, cost is a
quadratic function of state
Cost at the last time point
Cost-to-go at the next to the last time point
5We will now show that if we choose the optimal u
at step p-1, then cost to go is once again a
quadratic function of state x.
Can be simplified to
Can be simplified to
6We just showed that for the last time step, the
cost to go is a quadratic function of x
The optimal u to at time point p-1 minimizes cost
to go J(p-1)
If at time point p-1 we indeed carry out this
optimal policy u, then the cost to go at time p-1
also becomes a linear function of x
If we now repeat the process and find the optimal
u for time point p-2, it will be
And if we apply the optimal u at time points p-2
and p-1, then the cost to go at time point p-2
will be a quadratic function of x
So in general, if for time points t1, , p we
calculated the optimal policy for u, then the
above gives us a recipe to compute the optima
policy for time point t.
7Summary of the linear quadratic tracking problem
Cost to go
The procedure is to compute the matrices W and G
from the last time point to the first time point.
8Modeling of an elbow movement
Continuous time model of the elbow
Discrete time model of the elbow
9Goal Reach a target at 30 deg in 300 ms time and
hold it there for 100 ms.
Unperturbed movement
Arm held at start for 200ms
Force pulse to the arm for 50ms
10Movement with a via point we set the cost to be
high at the time when we are supposed to be at
the via points.
11Stochastic optimal control
Biological processes have noise. For example,
neurons fire stochastically in response to a
constant input, and muscles produce a stochastic
force in response to constant stimulation. Here
we will see how to solve the optimal control
problem with additive Gaussian noise.
Cost to minimize
Because there is noise, we are no longer able to
observe x directly. Rather, the best we can do
is to estimate it. As we saw before, for a
linear system with additive noise the best
estimate of state is through the Kalman filter.
So our goal is to determine the best command u
for the current estimate of x so that we can
minimize the global cost function. Approach as
before, at the last time point p the cost is a
quadratic function of x. We will find the
optimal motor command for time point p-1 so that
it minimizes the expected cost to go. If we
perform the optimal motor command at p-1, then we
will see that the cost to go at p-1 is again a
quadratic function of x.
12Preliminaries Expected value of a squared
random variable. In the following example, we
assume that x is the random variable.
Scalar x
Vector x
13Cost at the last time point
14Cost-to-go at the next to the last time point
So we see that if our system has additive state
or measurement noises, the optimal motor command
remains the same as if the system had no noises
at all. When we use the optimal policy at time
point p-1, we see that, as before, the cost-to-go
at p-1 is a quadratic function of x. The matrix
W at p-1 remains the same as when the system had
no noise. The problem is that we do not have x.
The best that we can do is to estimate x via the
Kalman filter. We do this in the next slide.
15On trial p-1, our best estimate of x is the prior.
We compute the prior for the current trial from
the posterior of the last trial.
Kalman gain
The posterior estimate.
Our short-hand way to note the prior estimate of
x on trial p-1.
Although the noises in the system do not affect
the gain G, the estimate of x is of course
affected by the noises because the Kalman gain is
influenced by them.
16Summary of stochastic optimal control for a
linear system with additive Gaussian noise and
quadratic cost
Cost to go at the start
Cost to go at the end
17The duality of the Kalman filter and optimal
control
In the estimation problem, we have a model of how
we think the hidden states x are related to
observations y. Given an observation y, we have a
rule with which we can change our estimates.
Our objective is to minimize the trace of the
variance of our estimate xhat. This variance is
P. This trace is our scalar cost function, which
is quadratic in terms of xhat. We minimize it by
finding the optimal gain k. If we use this
optimal k, then we can compute the variance in
the next time step. Our cost (i.e., variance) of
course still remains quadratic in terms of xhat.
18The duality of the Kalman filter and optimal
control, continued.
In the control problem, we have a model of how we
think the hidden states x are related to commands
u and observations y. Our objective is to find
the u that minimizes a scalar cost. To find this
u, we run time backwards! We start at the end
time point and find the optimal u that minimizes
the cost to go. When we find this u, we then
move to the next time point and so on. The cost
to go is a quadratic function of hidden states.
This is very similar to the Kalman filter, where
the cost was a quadratic function of the hidden
states as well.
19Duality of optimal control and Kalman filter,
continued.
Optimal control
Motor cost
Weighting of state
Tracking cost
State noise
Measurement noise
State uncertainty
Kalman Filter
So W is like an estimate of state uncertainty
matrix, BTB is like state update noise Q, and L
is like measurement noise R. In optimal control,
the motor commands are generated by applying a
gain to the state. This gain is like the Kalman
gain.
20Noise characteristics of biological systems are
not additive Gaussian Noise in the motor output
grows with the size of the motor command
Electrical stimulation of the muscle
Voluntary contraction of the muscle
A
B
The standard deviation of noise grows with mean
force in an isometric task. Participants produced
a given force with their thumb flexors. In one
condition (labeled voluntary), the participants
generated the force, whereas in another condition
(labeled NMES) the experimenters stimulated
their muscles artificially to produce force. To
guide force production, the participants viewed a
cursor that displayed thumb force, but the
experimenters analyzed the data during a 4-s
period in which this feedback had disappeared. A.
Force produced by a typical participant. The
period without visual feedback is marked by the
horizontal bar in the 1st and 3rd columns (top
right) and is expanded in the 2nd and 4th
columns. B. When participants generated force,
noise (measured as the standard deviation)
increased linearly with force magnitude.
Abbreviations NMES, neuromuscular electrical
stimulation MVC, maximum voluntary contraction.
From Jones et al. (2002) J Neurophysiol 881533.
21Representing signal dependent noise
Zero mean Gaussian noise
signal dependent motor noise
Zero mean Gaussian noise
signal dependent sensory noise
Vector of zero mean, variance 1 random variables
22Control problem with signal dependent noise
(Todorov 2005)
Cost per step
To find the motor commands that minimize the
total cost, we start at the last time step p and
work backwards. At time step p, the cost is a
quadratic function of x. At time step p-1, we
can find the optimal u that minimizes the cost to
go. When we find this optimal u, the cost to go
at p-1 will be a quadratic function of x plus a
quadratic function of x-xhat. In general, by
induction we can prove that as long as we apply
the optimal u, the cost to go will have this
quadratic form. This proof is due to E. Todorov,
Neural Computation, 2005.
23Cost at time step p (last time step)
Cost-to-go at p-1
Optimal u to minimize the cost-to-go at time step
p-1
24J(p-1) is the cost-to-go at time step p-1,
assuming that the optimal u is produced at
p-1. Note that unlike the cost at time step p,
this cost-to-go is quadratic in terms of x and
the error in estimation of x. So now we need to
show that if we continue to produce the optimal u
at each time step, the cost-to-go remains in this
form for all time steps.
25Conjecture If at some time point k1 the
cost-to-go under an optimal control policy is
quadratic in x and e, and provided that we
produce a u that minimizes the cost-to-go at time
step k, then the cost-to-go at time step k will
also be quadratic. To prove this, our first step
is to find the u that minimizes the cost-to-go at
time step k, and then show the at the resulting
optimal cost-to-go remains in the quadratic form
above.
To compute the expected value term, we need to do
some work on the term e.
26To compute the Expected value of J(k1), we
compute the Exp value of the two quadratic terms
(the Exp value of the third term is zero as it is
composed only of zero mean random variables).
Terms that do not depend on u
27So we just showed that if at some time point k1
the cost-to-go under an optimal control policy is
quadratic in x and e, and provided that we
produce a u that minimizes the cost-to-go at time
step k, then the cost-to-go at time step k will
also be quadratic. Since we had earlier shown
that at time step p-1 the cost is quadratic in x
and e, we now have the solution to our problem.
28Summary Control problem with signal dependent
noise (Todorov 2005)
Cost per step
For the last time step