Experience-Based Identification - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Experience-Based Identification

Description:

Jay McCarthy Sangsuree Vasupongayya. Particular acknowledgements go to my Lab Managers: ... Blue: LoFLYTE w/ Unaugmented control. Red: LoFLYTE w/Augmented ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 44

Provided by: georgegl7

Category:

more less

Transcript and Presenter's Notes

Title: Experience-Based Identification

1
Experience-BasedIdentification Control
viaHigher-Level Reinforcement Learning

George G. Lendaris
NW Computational Intelligence Laboratory
Portland State University, Portland, OR
Supported by NSF ECS-0301022

Ideas Example (Assume experienced car driver)
I. Car attributes
1) driving own car
2) driving friends car.
II. Environment clear afternoon with
1) dry pavement
2) icy pavement.
III. Performance criteria
1) Road race minimize time.
2) Elderly relative on excursion maximize
comfort.
Use same base set of driving skills, but when
change from 1 to 2, make adjustments to
control law and/or decision logic, from
collection acquired via EXPERIENCE.
CONTEXT comprises I, II, III.

NWCIL WCCI - IJCNN2006 2
3
CONTEXT We formulate context as comprising
three components A) Plant, B) Environment, and
C) Objectives
(characterized via performance criteria labeled
CF). Specification of all three yields a
specific context to each specific context there
corresponds a particular control law a change in
any of the components results in a different
context.
CONTEXT

B. ENVIRONMENT
.1 .2

A. PLANT
.1 .2

C. CF
.1 .2

. . . . . . .
CONTROL LAW REPOSITORY (EXPERIENCE)
NWCIL WCCI - IJCNN2006 3
4

CONTEXT is fundamental to the approach, so
performed a historical overview of the control
field vis-à-vis the explicit role that context
has (or has not) played in the various
formulations and approaches.
Phase 1 DESIGN BASED on INTUITION and
INVENTION.
Phase 2 DESIGN BASED on MATHEMATICAL
TOOLS.
Phase 3 DESIGN for CONTEXT DEPENDENCE.
e.g. Adaptive Control and Learning Control
accommodates a modicum of variations in
context via on-line parameter adjustments
Phase 4 (next slide)

NWCIL WCCI - IJCNN2006 4
5
Phase 4 (new) DESIGN for EXPERIENCE-BASED
PROCESSES, including
AUTONOMOUS CONTEXT
DISCERNMENT and MODEL
SELECTION Stipulated requirements for this
phase - Agent has the ability a) to use
experience for model selection (plant
or controller) and b) to do so effectively
and efficiently. Fundamental aspects to
consider 1) context, 2) discerning current
context, 3) selecting appropriate model from
experience repository for the discerned context,
and 4) doing the latter two in an effective and
efficient manner. (Aspects 3, 4, and potentially
2, entail a memory property)
NWCIL WCCI - IJCNN2006 5
6

KEY IDEA of HLLA
Re-purpose the Reinforcement Learning method
(to a higher level) such that
Instead of using it to design an optimal
controller for a given task,
An already achieved collection of such solutions
for a variety of related contexts is provided
(as an experience repository), and
HLLA creates a strategy for optimally selecting
a solution from the repository.

NWCIL WCCI - IJCNN2006 6
7
Conceptual layout of EB Control process
Context Monitoring by Agent (context
awareness)
Starting Condition
All OK
Criterion Function Assessment
CONTROLLER
PLANT
Off Nominal
Agent Performs
Controller SELECTION (EB)
EB-UPDATED PLANT MODEL
Agent Perform SID (EB)
Off Nominal
context discernment
EB-UPDATED CONTROLLER MODEL
Criterion Function Assessment
EB-UPDATED PLANT MODEL
All OK
Install Updated Controller Design
Run Simulation
NWCIL WCCI - IJCNN2006 7
8

Populating the Repository
In practice, might build repository piece by
piece
- via available design tools (e.g., Phase 2
tools), and
generate controllers for a given application and
collect them into a repository, along with a list
of attributes (parameters) that can be used as an
index to facilitate their selection, BUT

NWCIL WCCI - IJCNN2006 8
9

Populating the Repository continued
difficult to define the list of attributes to
serve as useful indexing mechanisms, and
to come up with a useful parameterization of the
context for the given task.
We note that the choice of representations and
associated mappings directly influences
subsequent
- efficiency of access
- notion of nearness, and
- notion of generalization

NWCIL WCCI - IJCNN2006 9
10

Populating the Repository continued
For research purposes, start with a synthetic
method
employ an analytic equation, neural network,
their parameters provide built-in indexing
mechanisms
e.g., if the plant is known to be linear, employ
say a fifth order transfer function, and the
functions coefficients used to define the
indexing
e.g., if known to be second order linear, then
use
second order transfer function
? access via 2-dim. vs. 5-dim space
? efficiency generalization
implications

NWCIL WCCI - IJCNN2006 10
11

MANIFOLDS
Important to endow the index with
property of being searchable, and
operational notion of nearness
The mathematical construct of manifolds (from
geometric topology) provides a useful formalism
for this application.
a set of elements, S, and
a coordinate system in Rn
(a one-to-one mapping from S to Rn that
specifies each element in S via a vector of n
real numbers, a.k.a. the coordinates of the
element). For Rn is searchable nearness is
Euclidean.
(Terms index and coordinates are here
synonymous.)

NWCIL WCCI - IJCNN2006 11
12
MAPPINGS BETWEEN CONTEXT SPACE AND VARIOUS
MANIFOLDS (? repository set part)
In general, how does one craft an appropriate
mapping from the full Context Space (whatever
form of representation is employed) to the
coordinate system of the control manifold?
Strategy is to employ HLLA to learn the mappings.
NWCIL WCCI - IJCNN2006 12
13
EBSID Example Neural Network as Plant. The goal
of this SID process was to train a NN (called
CDN) to select a NN from a neural manifold (plant
model repository) to match the behavior of an
observed NN (the plant/ system).
NWCIL WCCI - IJCNN2006 13
14
EBSID Example Pole-Cart Plant. This benchmark
system was used to further develop ideas.
Repository populated synthetically via a
parameterized set of equations for the pole-cart
plant. Parameters explicitly included the pole
length and mass. The latter were varied during
the experiment. Task was to discern changes in
context (pole length, pole mass) when they
occurred. After the CDN is trained, it
functions to adjust the manifold coordinate
values (pole length and mass). These parameter
values instantiate a plant model in the
repository. Once the CDN issues null adjustments,
the appropriate model has been selected
(specified via current values of CD) and is to be
used by the Agent to select a corresponding
controller
NWCIL WCCI - IJCNN2006 14
15
Demonstration of Context Discernment in response
to change in Pole-Cart parameter values (context
change) at every 50th iteration
Errors between state variable values for
pole-cart system and for models selected during
discernment process.
NWCIL WCCI - IJCNN2006 15
16
Extension of Example 3 first foray into mixed
representations. Use NN models of the pole-cart
plant in the repository instead of equations. A
NN with 174 weights trained to emulate the plant
for a given length/mass combination. Using this
trained NN as a starting point, eight new NNs
were trained for different mass/length (M/L)
combinations. The weights of the resulting NNs
were analyzed for changes from the original
(base) case A sensitivity-variance metric was
crafted and used to select 22 weights. A new NN
with 152 weights frozen to the base-case design
and with 22 adjustable weights was then trained
for the same eight cases, yielding models all
within tolerance.
NWCIL WCCI - IJCNN2006 16
17

Extension of Example 3 first foray into mixed
representations (continued)
- These NNs now populate the repository, with the
22 adjustable weights as its coordinates.
- Even though the pole M L are the two context
parameters that are changing, the representation
available to the Agent comprises 22 parameters.
This is a more difficult task than previous
example, as the model plant and actual plant have
different forms of representation.
The HLLA is to design a CDN that selects the
appropriate NN for a given M/L combination, and
may have to develop mappings between the two sets
of coordinates.
NOT ACCOMPLISHED YET, as moved on to next task

NWCIL WCCI - IJCNN2006 17
18
To move beyond working with just simulations, we
have begun initial stages of implementing a Robot
with Context Discerning and Controller Selecting
capability at NWCIL.
Sony AIBO robotic dog has been modified into a
research platform. Its task is to learn to
discern changes in walking surface types and
adjust its gait accordingly.
NWCIL WCCI - IJCNN2006 18
19

AIBO experiments to date
Constructed five different surface types (4
long)
hardwood, thin foam, thin carpet, thick
shag carpet,
reversed shag carpet
2. Used genetic algorithm to develop good AIBO
gait for each of five surfaces.
Note each resulting gait yields better
performance measure on its respective surface
than does the default gait differences are
visually discernable.
3. AIBO made test walks on the five surfaces for
each gait, and data streams from 17
joint-actuator sensors were recorded.
An AR model was computed/stored for each of the
25 sets of sensory experiences.

NWCIL WCCI - IJCNN2006 19
20

AIBO experiments (cont.)
5. Now, when walking on a surface, AIBO discerns
the
surface type (CONTEXT) by processing its
current kinesthetic experience through the
models in its repository, and selects the
most
similar one.
6. It then selects the gait corresponding to this
surface, and adjusts its walk accordingly.
7. Show video

NWCIL WCCI - IJCNN2006 20
21
CONCLUSION We currently conjecture that the
proposed experience-based approach will usher in
a whole new phase of development of the decision
and controls fields making a significant stride
toward the achievement of more human-like
decision and control. We also conjecture that
the context discernment concepts plus the
manifolds representation will provide a basis for
constructing learning agents capable of long term
rapidly accessible memory. If so, this could pave
the way for scaling neural systems to brain-like
capabilities
NWCIL WCCI - IJCNN2006 21
22
I wish to acknowledge the creative role filled by
the various graduate students who have worked in
the NWCIL, both present and past. These include,
but not limited to, in alphabetical order
Michael Carroll Christian Paintz Lars
Holmstrom Alec Rogers Steven Hutsell Adreas
Rustan Bryan Johnson Larry Schultz Joe
Lotz Steve Shervais Shari Matzner Andrew
Toland Jay McCarthy Sangsuree Vasupongayya
Particular acknowledgements go to my Lab
Managers Thaddeus Shannon Roberto
Santiago. ----------------------- Also, NSF
Grants ECS-9904378 ECS-0301022
NWCIL WCCI - IJCNN2006 22
23
NWCIL WCCI - IJCNN2006 23
24
Demonstration of Context Discernment in response
to change in NN plant parameter values (context
change) at every 100th iteration.
NWCIL WCCI - IJCNN2006 24
25

CONTEXT DISCERNMENT
When done relative to the plant, entails a
form of System Identification (SID).
Employ notion of experience repository for the
SID task as well (comprising
relevant plant models).
HLLA creates a strategy for optimally selecting
a model from the repository.

NWCIL WCCI - IJCNN2006 25
26

SELECTION
- The selection process is triggered by the Agent
becoming aware that a change in context has
occurred (contextual awareness)
Followed by the Agent seeking information about
what changed (context discernment)
And finally, by selection.

NWCIL WCCI - IJCNN2006 26
27

At NWCIL, experience is being addressed
- via a notion of experience repository, and
via a novel concept for applying
Reinforcement Learning / Adaptive Critics
vis-à-vis the experience repository,
? Higher-Level Learning Algorithm
(HLLA).

NWCIL WCCI - IJCNN2006 27
28
System configuration during DHP Design of
Controller Augmentation System (2003)
NWCIL WCCI - IJCNN2006 28
29

Blue LoFLYTE w/ Unaugmented control
Red LoFLYTE w/Augmented Control
Black LoFLYTE

Pitch w/ cg Shift
NWCIL WCCI - IJCNN2006 29
30

Blue LoFLYTE w/ Unaugmented control
Red LoFLYTE w/Augmented Control
Black LoFLYTE

Roll 1
NWCIL WCCI - IJCNN2006 30
31

Blue LoFLYTE w/ Unaugmented control
Red LoFLYTE w/Augmented Control
Black LoFLYTE

Roll 2
NWCIL WCCI - IJCNN2006 31
32
DEFINITIONS 1. Agent computational
intelligence device. 2. Context Variables (Agent
centric) those attributes of i) the environment
and ii) the plant/process whose variations could
engender changes to the decision rule / control
policy employed by the Agent while accomplishing
the Agents current objective or goal and in
addition, iii) the criteria (representing the
objective or goal) to be used for designing and
subsequent selection of the decision rule or
control law. We use the term Criterion Function
(CF) to represent these criteria. 3. Context
Space (Agent centric) a vector space in which
each context variable is associated to a
dimension. The Context Space is conceptualized as
comprising three sub-spaces, one each associated
with the i) Plant, ii) Environment, and iii)
Criterion Function. 4. Context (Agent centric)
a point in Context Space the set of values taken
on by the context variables in a given situation.
NWCIL WCCI - IJCNN2006 32
33

5. Context Discernment the act or process of
determining the current values of the context
variables (current point in Context Space)
appropriate to the task being performed. Webster
on-line for discern to recognize or identify
as separate and distinct.
6. Experience A two-component concept
Component A Repository of previously developed
context-specific models
(controller, plant, or CF models), and
Component B Algorithms used by the Agent to
effectively and efficiently select a model
from the repository as changes in context
occur. Note A key task of the HLLA is to
train the Agent to learn Component B.
7. Selection the act of choosing/retrieving
appropriate element of the repository
corresponding to the discerned context.
8. Higher-Level Learning Algorithm (HLLA) The
reference level for the term higher is the case
where learning algorithms are applied directly to
the design of optimal controllers as in Learning
Control, ones that would be accumulated in the
repository. Higher-Level here means applying
the learning method to create a strategy for
selecting a good controller from the repository,
where the process of selection is optimized.
Definition of the Utility function (CF) is key
for application of this process.

NWCIL WCCI - IJCNN2006 33
34
OBSERVATION 1 In the case of humans, the
more knowledge / experience attained, the more
improvement in effectiveness of performing new
related tasks, with little or no speed
penalty. OBSERVATION 2 In the case of AI
rule-based systems, the more knowledge attained,
the slower the processing. RESEARCH OBJECTIVE
Develop a computational intelligence device
(Agent) that employs experience to enhance
effectiveness and efficiency of certain processes
i.e. endow these processes with more human-like
attributes.
NWCIL WCCI - IJCNN2006 34
35

Since CONTEXT is fundamental to the approach
performed a historical overview of the control
field vis-à-vis the explicit role that context
has (or has not) played in the various
formulations and approaches.
overview also motivated by the belief that
adding the capability to employ experience in the
controller design / selection process will usher
in a qualitatively new phase in the evolution of
the controls field.

NWCIL WCCI - IJCNN2006 35
36

Phase 1 DESIGN BASED on INTUITION and
INVENTION.
control devices date to antiquity
well-known recent device is the flyball
governor James Watt, 1788
design of such devices were the product of
intuition and inventive genius, with little
support from mathematically based tools, and with
no explicit notion of context.

NWCIL WCCI - IJCNN2006 36
37

Phase 2 DESIGN BASED on MATHEMATICAL
TOOLS.
mathematics has played a fundamental role in
developing the control field
Maxwell used differential equations to analyze
the flyball governor dynamics, ca. 1870
followed by Fourier and Laplace transforms, state
space methods, stochastic methods, Hilbert space
methods, algebraic and geometric topological
methods,
- design is done off-line

NWCIL WCCI - IJCNN2006 37
38
Phase 2 - continued - contains design methods
where the controller is placed in service with no
associated mechanism for modifying its design in
response to changes in context - each controller
design is based on a single point in the Context
Space, or at most, a small neighborhood of
points - this phase includes at least the
following well known design methods Classical
Control, Modern Control, Optimal Control,
Stochastic Control, and Robust Control
NWCIL WCCI - IJCNN2006 38
39

Phase 3 DESIGN for CONTEXT DEPENDENCE.
such large variation in context that
- fixed controllers are not sufficient
first, discern current context
then, use previously designated process to
adjust controller parameters, based on
observations
e.g. Adaptive Control and Learning Control
accommodates a modicum of variations in context
via on-line parameter adjustments
mechanism for performing accommodations is
distinct from that defined for Phase 4.

NWCIL WCCI - IJCNN2006 39
40
NWCIL WCCI - IJCNN2006 40
41
EB IDENTIFICATION AND CONTROL
Generic structure for developing proposed
Experience Based System Identifier
NWCIL WCCI - IJCNN2006 41
42
EB IDENTIFICATION AND CONTROL cont.
Generic structure for developing proposed
Experience Based Controller
A good EB-Algorithm minimizes number of selection
cycles it has learned (via the Adaptive Critic
method) to make optimal use of a priori
knowledge.
NWCIL WCCI - IJCNN2006 42
43
NWCIL WCCI - IJCNN2006 43

Write a Comment

User Comments (0)