Machine reconstruction of human control strategies - PowerPoint PPT Presentation

About This Presentation

Title:

Machine reconstruction of human control strategies

Description:

A problem decomposition for behavioural cloning (indirect controllers, experiments, advantages) ... Leech(86), probably the first goal-structured learning of control ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 40

Provided by: BlazZ5

Category:

more less

Transcript and Presenter's Notes

Title: Machine reconstruction of human control strategies

1
Machine reconstruction of human control strategies

Dorian uc
Artificial Intelligence Laboratory
Faculty of Computer and Information Science
University of Ljubljana, Slovenia

2
Overview

Skill reconstruction and behavioural cloning
The learning problem
A problem decomposition for behavioural cloning
(indirect controllers, experiments, advantages)
Symbolic and qualitative skill reconstruction
Learning qualitative strategies QUIN algorithm
QUIN in skill reconstruction
Conclusions

3
Skill reconstruction and behavioural cloning

Motivation
understanding of the human skill
development of an automatic controler
ML approach to skill reconstruction learn a
control strategy from the logged data from
skilled human operators (execution trace). Later
called behavioural cloning (Michie, 93).
Early work Chambers and Michie(69), learning
control by imitation also by Donaldson(60,64)

4
Behavioural cloning some applications

Original approach clones usually induced as a
direct mapping from states to actions in the form
of trees or rule sets
Successfully used in domains as
pole balancing (Miche et al., 90)
piloting (Sammut et al., 92 Camacho 95)
container cranes (Urbancic, 94)
production line scheduling (Kerr and Kibira, 94)
Reviews in Sammut(96), Bratko et al(98)

5
Learning problem

Execution traces used as examples for ML to
induce
a control strategy (comprehensible, symbolic)
automatic controller (criterion of success)
Operators execution trace
a sequence of system states and corresponding
operators actions, logged to a file at a certain
frequency
Reconstruction of human control skill
Skill know how at subsymbolic level,
operational
Strategy explicitly described know how at
symbolic level

6
Container crane
Used in ports for load transportation
Control forces Fx, FL State X, dX,?, d?, L, dL
Based on previous work of Urbancic(94) Control
task transport the load from the start to the
goal position
7
Learning problem, cont.
Fx FL X dX ? d?
L dL 0 0 0.00 0.00 0.00
0.00 20.00 0.00 2500 0 0.00 0.00
-0.00 -0.01 20.00 0.00 6000 0 0.00
0.01 -0.01 -0.02 20.00 0.00 10000 0
0.02 0.10 -0.07 -0.27 20.00 0.00 14500
0 0.12 0.31 -0.32 -0.85 20.00 0.00
14500 0 0.35 0.59 -0.95 -1.49 20.00
0.01 .
.
8
Problems of original approach

Difficulties observed with the original approach
No guarantee of inducing with high probability a
successful clone (Urbancic and Bratko, 94)
Low robustness of clones
Comprehensibility of clones hard to understand

Michie(93,95) suggests that a kind problem
decomposition could be helpful learning from
exemplary performance requires more than mindless
imitation Recent approaches to behavioural
cloning (Stirling, 95 Bain and Sammut, 99
Camacho, 2000)
9
Related work

Leech(86), probably the first goal-structured
learning of control
CHURPs(Stirling, 95) separates control skills in
planning and actuation phases focuses on
planning component assumes the goals are given
GRAIL(Bain and Sammut, 99) learning goals by
decision trees and effects by abduction
Incremental Correction model(Camacho, 2000)
homeostatic and achievable goals parametrised
decision trees to learn goals wrapper-approach

10
Our approach

Our goals
transparency of the induced strategies
robust and successful controllers
Ideas
Learning problem decomposition (a) learning of
the constraints on operators trajectories, (b)
learning of the systems dynamics
Generalized trajectory as a continuous subgoal
Symbolic and qualitative constraints, use of
domain knowledge
Differences with related approaches
continuous generalized trajectory
qualitative strategies

11
Experimental domains

Container crane
we used execution traces from (Urbancic, 94)
Acrobot (DeJong, 95 Sutton, 96)
two link pendulum in a graviatational field
swing-up task
Bicycle riding (Randlov, Alstr?m, 98)
drive the bike from the start to the goal
position requires simultaneous balancing
and goal-aiming
Simulators used in all experiments
Measure of success
time to accomplish the task

12
Operators trajectory

A sequence of the states from an execution trace
Path in the state space

Operators trajectory of the trolley velocity
(dX) in the space of X, ? and dX
13
Generalized trajectory
Induced constraints on operators trajectory

Constraints can be represented as
trees
equations
qualitative constraints

14
Qualitative and quantitative strategy

Quantitative strategy given with precise
numerical values or numeric constraints (decision
tree, equation)
Qualitative strategy may also use qualitative
constraints. A qualitative strategy defines a set
of quantitative strategies
We use qualitatively constrained functions
(QCFs) monotonicity constraints as used in
qualitative reasoning

15
Qualitatively constrained functions

M(x) ? arbitrary monotonically increasing fn. of
x
A QCF is a generalization of M, similar to qual.
proportionality predicates used in QPT(Forbus,
84)

Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
16
Problem decomposition
17
Direct and indirect controllers
Original approach, BOXES, ASE/ACE

Our approach
Also CHURPs(Stirling, 95), GRAIL(Bain and Sammut,
99), ICM(Camacho, 2000)

18
Robustness of direct and indirect controllers
against learning error

Experiment modelling learning of direct and
indirect controllers with some learning error
direct controllers correct action noise(?)
indirect controllers correct trajectory
noise(?)
Two error models
Gaussian noise
Biased Gaussian noise (all errors in the same
direction)
Simple, deterministic, discrete time system
Control task reach and maintain the goal value
Xg
Performance criterion controller error in Xg

19
Robustness of direct and indirect controllers
against learning error (2)
Biased noise affects direct controllers much more
20
Possible advantages of indirect controllers

Less prone to the departure from the operators
trajectory
More robust against change in the systems
dynamics and small changes in the task
generalizing the trajectory is often easier than
generalizing the actions
Generalized trajectory often easier to understand
(less details)

21
Symbolic and qualitative skill reconstruction
GoldHorn(Kriman, 98)
LWR(Atkeson et al., 97)

Experiments in the crane and acrobot domains

22
Experiments in the crane domain

GoldHorn induced the generalized trajectory of
the trolley velocity
dXdes 0.902 0.018 X2 0.090 X 0.050 ?

Qualitative strategy if X ? Xmid then dX
M,(X, ?) else dX M-,(X, ?)
23
Transforming qualitative into quantitative
strategies

By concretizing qualitative parameters into real,
numeric values or real-valued functions
First experiment using randomly generated
functions satisfying qualitative constraints and
additional domain knowledge
maximal and minimal values of the state variables
the trolley starts towards goal
the trolley stops at goal

Second experiment using additional domain
knowledge

24
Efficiency of the qualitative strategy

The results show that qualitative strategy is
general (the proper selection of qualitative
parameters is not crucial)
successful offers the space for controller
optimization
Similar experiments in acrobot domain

25
Qualitative induction

Motivation our experiments with qualitative
strategies (crane, acrobot)
Usual classification learning problem, but
learning of qualitative trees

in leaves are qualitatively constrained functions
(QCFs) QCFs give constraints on the class change
in response to a change in attributes
internal nodes (splits) define a partition of the
state space into areas with common qualitative
behavior of the class variable

26
Qualitatively constrained function (QCF)

M(x) ? arbitrary monotonically increasing fn. of
x
A QCF is a generalization of M, similar to qual.
proportionality predicates used in QPT(Forbus,
84)

Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
27
Learning QCFs
Pres 2 Temp / Vol Temp Vol Pres 315.00
56.00 11.25 315.00 62.00 10.16 330.00 50.00
13.20 300.00 50.00 12.00 300.00 55.00 10.90

Learning of the most consitent QCF
For each pair of examples form a qualitative
change vector
Select the QCF with minimal error-cost

28
Learning QCFs
QCF Incons. Amb. M(Temp)
M-(Temp) M(Vol) M-(Vol) M,(Temp,
Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol)
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) M(Vol) M-(Vol) M,(Tem
p,Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol
)
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) 2,4
1 M(Vol) 1,2,3 / M-(Vol)
4 / M,(Temp,Vol) 1,3
2 M,-(Temp,Vol) /
3,4 M-,(Temp,Vol) 1,2
3,4 M-,-(Temp,Vol) 4 2
qTempneg qVolneg qPrespos
Select QCF with minimal QCF error-cost
29
Learning qualitative tree

For every possible split, split the examples into
two subsets, find the most consistent QCF for
both subsets and select the split minimizing
tree-error cost (based on MDL)
Algorithm ep-QUIN uses every pair of examples
An improvement heuristic QUIN algorithm that
considers also locality and consistency of
qualitative change vectors

30
Experimental evaluation in artificial domains

On a set of artificial domains with uniformly
distributed attributes 2 irrelevant attributes
Results by QUIN better than ep-QUIN
In simple domains QUIN finds qualitative
relations corresponding to our intuition

31
QUIN in bicycle riding

Control task
drive a bike from the start to the goal position
the bikes speed is assumed constant
difficult because balancing and goal-aiming must
be performed simultaneously
Controlled by torque applied to the handlebars
State goalAngle, goalDist, ?, d?, ?, d?
QUIN ?des f(State)

32
Induced qualitative strategy
goalAngle
? 0.015
gt 0.015
goalAngle
M,,-(?, d?,goalAngle)
? -0.027
gt -0.027
M,,-(?, d?,goalAngle)
M,(?, d?)
Same QCFs
33
Induced qualitative strategy
goalAngle near zero
yes
no
M,(?, d?)
M,,-(?, d?,goalAngle)
Balancing
Balancing and goal-aiming
Goal-aiming turn the front wheel away from the
goal
If the bike starts falling over then turn the
front wheel in the direction of the fall
34
Transforming qualitative into quantitative
strategies

Transform QCFs into real valued functions by
using simple domain knowledge
maximal front wheel deflection
drive straight if bike is aiming at the goal
f(0,0,0)0
balancing is more important than aiming at the
goal
400 randomly generated quantitative strategies
59.2 successful
Test of robustness
Change in the start state (58 successful)
Random displacement of the bicyclist from the
mass center (26 successful)

35
QUIN in crane domain

Crane control requires trolley and rope control
Experiments with traces of 2 operators using
different control styles
Rope control
QUIN Ldes f(X, dX, ?, d?, dL)
Often very simple strategy induced

Ldes M( X ) bring down the load as the trolley
moves from the start to the goal position
36
Trolley control

QUIN dXdes f(X, ?, d?)
More diversity in the induced strategies

Enables reconstruction of individual differences
in control styles
X lt 20.7
X lt 29.3
yes
yes
no
no
M(X)
M,,-(X, ?, d?)
X lt 60.1
d? lt -0.02
yes
yes
no
no
M-(X)
M(?)
M-(X)
M-,(X,?)
37
Role of human intervention

Approach facilitates the use of user knowledge
In our experiments the following types of human
intervention were used
Selection of the dependent trajectory variable
Disregarding some state variables
Selection and analysis of induced equations
Using domain knowledge in transforming
qualitative into quantitative strategies
According to empirical evidence different
(sensible) choices and use of domain knowledge
also give successful strategies

38
Contributions of the thesis

A decomposition of the behavioural cloning
problem into the learning of continuous
generalized trajectory and systems dynamics
Modelling of human skill with symbolic and
qualitative constraints
QUIN algorithm for learning qualitative
constraint trees
Applying QUIN to skill reconstruction
Experimental evaluation in several dynamic domains

39
Further work

Applying QUIN in different domains where
qualitative models preferred QUIN improvements
Qualitative simulation to generate possible
explanations of a qualitative strategy
Reducing the space of admissible controllers by
qualitative reasoning
Minimizing the trajectory constraints error in
all the state variables would not require the
selection of the dependent trajectory variable

Write a Comment

User Comments (0)