Title: Machine reconstruction of human control strategies
1Machine reconstruction of human control strategies
- Dorian uc
- Artificial Intelligence Laboratory
- Faculty of Computer and Information Science
- University of Ljubljana, Slovenia
- Skill reconstruction and behavioural cloning
- The learning problem
- A problem decomposition for behavioural cloning
(indirect controllers, experiments, advantages) - Symbolic and qualitative skill reconstruction
- Learning qualitative strategies QUIN algorithm
- QUIN in skill reconstruction
- Conclusions
3Skill reconstruction and behavioural cloning
- Motivation
- understanding of the human skill
- development of an automatic controler
- ML approach to skill reconstruction learn a
control strategy from the logged data from
skilled human operators (execution trace). Later
called behavioural cloning (Michie, 93). - Early work Chambers and Michie(69), learning
control by imitation also by Donaldson(60,64)
4Behavioural cloning some applications
- Original approach clones usually induced as a
direct mapping from states to actions in the form
of trees or rule sets - Successfully used in domains as
- pole balancing (Miche et al., 90)
- piloting (Sammut et al., 92 Camacho 95)
- container cranes (Urbancic, 94)
- production line scheduling (Kerr and Kibira, 94)
- Reviews in Sammut(96), Bratko et al(98)
5Learning problem
- Execution traces used as examples for ML to
induce - a control strategy (comprehensible, symbolic)
- automatic controller (criterion of success)
- Operators execution trace
- a sequence of system states and corresponding
operators actions, logged to a file at a certain
frequency - Reconstruction of human control skill
- Skill know how at subsymbolic level,
operational - Strategy explicitly described know how at
symbolic level
6Container crane
Used in ports for load transportation
Control forces Fx, FL State X, dX,?, d?, L, dL
Based on previous work of Urbancic(94) Control
task transport the load from the start to the
goal position
7Learning problem, cont.
Fx FL X dX ? d?
L dL 0 0 0.00 0.00 0.00
0.00 20.00 0.00 2500 0 0.00 0.00
-0.00 -0.01 20.00 0.00 6000 0 0.00
0.01 -0.01 -0.02 20.00 0.00 10000 0
0.02 0.10 -0.07 -0.27 20.00 0.00 14500
0 0.12 0.31 -0.32 -0.85 20.00 0.00
14500 0 0.35 0.59 -0.95 -1.49 20.00
0.01 .
8Problems of original approach
- Difficulties observed with the original approach
- No guarantee of inducing with high probability a
successful clone (Urbancic and Bratko, 94) - Low robustness of clones
- Comprehensibility of clones hard to understand
Michie(93,95) suggests that a kind problem
decomposition could be helpful learning from
exemplary performance requires more than mindless
imitation Recent approaches to behavioural
cloning (Stirling, 95 Bain and Sammut, 99
Camacho, 2000)
9Related work
- Leech(86), probably the first goal-structured
learning of control - CHURPs(Stirling, 95) separates control skills in
planning and actuation phases focuses on
planning component assumes the goals are given - GRAIL(Bain and Sammut, 99) learning goals by
decision trees and effects by abduction - Incremental Correction model(Camacho, 2000)
homeostatic and achievable goals parametrised
decision trees to learn goals wrapper-approach
10Our approach
- Our goals
- transparency of the induced strategies
- robust and successful controllers
- Ideas
- Learning problem decomposition (a) learning of
the constraints on operators trajectories, (b)
learning of the systems dynamics - Generalized trajectory as a continuous subgoal
- Symbolic and qualitative constraints, use of
domain knowledge - Differences with related approaches
- continuous generalized trajectory
- qualitative strategies
11Experimental domains
- Container crane
- we used execution traces from (Urbancic, 94)
- Acrobot (DeJong, 95 Sutton, 96)
- two link pendulum in a graviatational field
swing-up task - Bicycle riding (Randlov, Alstr?m, 98)
- drive the bike from the start to the goal
position requires simultaneous balancing - and goal-aiming
- Simulators used in all experiments
- Measure of success
- time to accomplish the task
12Operators trajectory
- A sequence of the states from an execution trace
- Path in the state space
Operators trajectory of the trolley velocity
(dX) in the space of X, ? and dX
13Generalized trajectory
Induced constraints on operators trajectory
- Constraints can be represented as
- trees
- equations
- qualitative constraints
14Qualitative and quantitative strategy
- Quantitative strategy given with precise
numerical values or numeric constraints (decision
tree, equation) - Qualitative strategy may also use qualitative
constraints. A qualitative strategy defines a set
of quantitative strategies - We use qualitatively constrained functions
(QCFs) monotonicity constraints as used in
qualitative reasoning
15Qualitatively constrained functions
- M(x) ? arbitrary monotonically increasing fn. of
x - A QCF is a generalization of M, similar to qual.
proportionality predicates used in QPT(Forbus,
Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
16Problem decomposition
17Direct and indirect controllers
Original approach, BOXES, ASE/ACE
- Our approach
- Also CHURPs(Stirling, 95), GRAIL(Bain and Sammut,
99), ICM(Camacho, 2000)
18Robustness of direct and indirect controllers
against learning error
- Experiment modelling learning of direct and
indirect controllers with some learning error - direct controllers correct action noise(?)
- indirect controllers correct trajectory
noise(?) - Two error models
- Gaussian noise
- Biased Gaussian noise (all errors in the same
direction) - Simple, deterministic, discrete time system
- Control task reach and maintain the goal value
Xg - Performance criterion controller error in Xg
19Robustness of direct and indirect controllers
against learning error (2)
Biased noise affects direct controllers much more
20Possible advantages of indirect controllers
- Less prone to the departure from the operators
trajectory - More robust against change in the systems
dynamics and small changes in the task - generalizing the trajectory is often easier than
generalizing the actions - Generalized trajectory often easier to understand
(less details)
21Symbolic and qualitative skill reconstruction
GoldHorn(Kriman, 98)
LWR(Atkeson et al., 97)
- Experiments in the crane and acrobot domains
22Experiments in the crane domain
- GoldHorn induced the generalized trajectory of
the trolley velocity - dXdes 0.902 0.018 X2 0.090 X 0.050 ?
Qualitative strategy if X ? Xmid then dX
M,(X, ?) else dX M-,(X, ?)
23Transforming qualitative into quantitative
- By concretizing qualitative parameters into real,
numeric values or real-valued functions - First experiment using randomly generated
functions satisfying qualitative constraints and
additional domain knowledge - maximal and minimal values of the state variables
- the trolley starts towards goal
- the trolley stops at goal
- Second experiment using additional domain
24Efficiency of the qualitative strategy
- The results show that qualitative strategy is
- general (the proper selection of qualitative
parameters is not crucial) - successful offers the space for controller
optimization - Similar experiments in acrobot domain
25Qualitative induction
- Motivation our experiments with qualitative
strategies (crane, acrobot) - Usual classification learning problem, but
learning of qualitative trees
- in leaves are qualitatively constrained functions
(QCFs) QCFs give constraints on the class change
in response to a change in attributes - internal nodes (splits) define a partition of the
state space into areas with common qualitative
behavior of the class variable
26Qualitatively constrained function (QCF)
- M(x) ? arbitrary monotonically increasing fn. of
x - A QCF is a generalization of M, similar to qual.
proportionality predicates used in QPT(Forbus,
Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
27Learning QCFs
Pres 2 Temp / Vol Temp Vol Pres 315.00
56.00 11.25 315.00 62.00 10.16 330.00 50.00
13.20 300.00 50.00 12.00 300.00 55.00 10.90
- Learning of the most consitent QCF
- For each pair of examples form a qualitative
change vector - Select the QCF with minimal error-cost
28Learning QCFs
QCF Incons. Amb. M(Temp)
M-(Temp) M(Vol) M-(Vol) M,(Temp,
Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol)
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) M(Vol) M-(Vol) M,(Tem
p,Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) 2,4
1 M(Vol) 1,2,3 / M-(Vol)
4 / M,(Temp,Vol) 1,3
2 M,-(Temp,Vol) /
3,4 M-,(Temp,Vol) 1,2
3,4 M-,-(Temp,Vol) 4 2
qTempneg qVolneg qPrespos
Select QCF with minimal QCF error-cost
29Learning qualitative tree
- For every possible split, split the examples into
two subsets, find the most consistent QCF for
both subsets and select the split minimizing
tree-error cost (based on MDL) - Algorithm ep-QUIN uses every pair of examples
- An improvement heuristic QUIN algorithm that
considers also locality and consistency of
qualitative change vectors
30Experimental evaluation in artificial domains
- On a set of artificial domains with uniformly
distributed attributes 2 irrelevant attributes - Results by QUIN better than ep-QUIN
- In simple domains QUIN finds qualitative
relations corresponding to our intuition
31QUIN in bicycle riding
- Control task
- drive a bike from the start to the goal position
- the bikes speed is assumed constant
- difficult because balancing and goal-aiming must
be performed simultaneously - Controlled by torque applied to the handlebars
- State goalAngle, goalDist, ?, d?, ?, d?
- QUIN ?des f(State)
32Induced qualitative strategy
? 0.015
gt 0.015
M,,-(?, d?,goalAngle)
? -0.027
gt -0.027
M,,-(?, d?,goalAngle)
M,(?, d?)
Same QCFs
33Induced qualitative strategy
goalAngle near zero
M,(?, d?)
M,,-(?, d?,goalAngle)
Balancing and goal-aiming
Goal-aiming turn the front wheel away from the
If the bike starts falling over then turn the
front wheel in the direction of the fall
34Transforming qualitative into quantitative
- Transform QCFs into real valued functions by
using simple domain knowledge - maximal front wheel deflection
- drive straight if bike is aiming at the goal
f(0,0,0)0 - balancing is more important than aiming at the
goal - 400 randomly generated quantitative strategies
59.2 successful - Test of robustness
- Change in the start state (58 successful)
- Random displacement of the bicyclist from the
mass center (26 successful)
35QUIN in crane domain
- Crane control requires trolley and rope control
- Experiments with traces of 2 operators using
different control styles - Rope control
- QUIN Ldes f(X, dX, ?, d?, dL)
- Often very simple strategy induced
Ldes M( X ) bring down the load as the trolley
moves from the start to the goal position
36Trolley control
- QUIN dXdes f(X, ?, d?)
- More diversity in the induced strategies
Enables reconstruction of individual differences
in control styles
X lt 20.7
X lt 29.3
M,,-(X, ?, d?)
X lt 60.1
d? lt -0.02
37Role of human intervention
- Approach facilitates the use of user knowledge
- In our experiments the following types of human
intervention were used - Selection of the dependent trajectory variable
- Disregarding some state variables
- Selection and analysis of induced equations
- Using domain knowledge in transforming
qualitative into quantitative strategies - According to empirical evidence different
(sensible) choices and use of domain knowledge
also give successful strategies
38Contributions of the thesis
- A decomposition of the behavioural cloning
problem into the learning of continuous
generalized trajectory and systems dynamics - Modelling of human skill with symbolic and
qualitative constraints - QUIN algorithm for learning qualitative
constraint trees - Applying QUIN to skill reconstruction
- Experimental evaluation in several dynamic domains
39Further work
- Applying QUIN in different domains where
qualitative models preferred QUIN improvements - Qualitative simulation to generate possible
explanations of a qualitative strategy - Reducing the space of admissible controllers by
qualitative reasoning - Minimizing the trajectory constraints error in
all the state variables would not require the
selection of the dependent trajectory variable