Machine reconstruction of human control strategies - PowerPoint PPT Presentation

About This Presentation
Title:

Machine reconstruction of human control strategies

Description:

A problem decomposition for behavioural cloning (indirect controllers, experiments, advantages) ... Leech(86), probably the first goal-structured learning of control ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 40
Provided by: BlazZ5
Category:

less

Transcript and Presenter's Notes

Title: Machine reconstruction of human control strategies


1
Machine reconstruction of human control strategies
  • Dorian uc
  • Artificial Intelligence Laboratory
  • Faculty of Computer and Information Science
  • University of Ljubljana, Slovenia

2
Overview
  • Skill reconstruction and behavioural cloning
  • The learning problem
  • A problem decomposition for behavioural cloning
    (indirect controllers, experiments, advantages)
  • Symbolic and qualitative skill reconstruction
  • Learning qualitative strategies QUIN algorithm
  • QUIN in skill reconstruction
  • Conclusions

3
Skill reconstruction and behavioural cloning
  • Motivation
  • understanding of the human skill
  • development of an automatic controler
  • ML approach to skill reconstruction learn a
    control strategy from the logged data from
    skilled human operators (execution trace). Later
    called behavioural cloning (Michie, 93).
  • Early work Chambers and Michie(69), learning
    control by imitation also by Donaldson(60,64)

4
Behavioural cloning some applications
  • Original approach clones usually induced as a
    direct mapping from states to actions in the form
    of trees or rule sets
  • Successfully used in domains as
  • pole balancing (Miche et al., 90)
  • piloting (Sammut et al., 92 Camacho 95)
  • container cranes (Urbancic, 94)
  • production line scheduling (Kerr and Kibira, 94)
  • Reviews in Sammut(96), Bratko et al(98)

5
Learning problem
  • Execution traces used as examples for ML to
    induce
  • a control strategy (comprehensible, symbolic)
  • automatic controller (criterion of success)
  • Operators execution trace
  • a sequence of system states and corresponding
    operators actions, logged to a file at a certain
    frequency
  • Reconstruction of human control skill
  • Skill know how at subsymbolic level,
    operational
  • Strategy explicitly described know how at
    symbolic level

6
Container crane
Used in ports for load transportation
Control forces Fx, FL State X, dX,?, d?, L, dL
Based on previous work of Urbancic(94) Control
task transport the load from the start to the
goal position
7
Learning problem, cont.
Fx FL X dX ? d?
L dL 0 0 0.00 0.00 0.00
0.00 20.00 0.00 2500 0 0.00 0.00
-0.00 -0.01 20.00 0.00 6000 0 0.00
0.01 -0.01 -0.02 20.00 0.00 10000 0
0.02 0.10 -0.07 -0.27 20.00 0.00 14500
0 0.12 0.31 -0.32 -0.85 20.00 0.00
14500 0 0.35 0.59 -0.95 -1.49 20.00
0.01 .
.
8
Problems of original approach
  • Difficulties observed with the original approach
  • No guarantee of inducing with high probability a
    successful clone (Urbancic and Bratko, 94)
  • Low robustness of clones
  • Comprehensibility of clones hard to understand

Michie(93,95) suggests that a kind problem
decomposition could be helpful learning from
exemplary performance requires more than mindless
imitation Recent approaches to behavioural
cloning (Stirling, 95 Bain and Sammut, 99
Camacho, 2000)
9
Related work
  • Leech(86), probably the first goal-structured
    learning of control
  • CHURPs(Stirling, 95) separates control skills in
    planning and actuation phases focuses on
    planning component assumes the goals are given
  • GRAIL(Bain and Sammut, 99) learning goals by
    decision trees and effects by abduction
  • Incremental Correction model(Camacho, 2000)
    homeostatic and achievable goals parametrised
    decision trees to learn goals wrapper-approach

10
Our approach
  • Our goals
  • transparency of the induced strategies
  • robust and successful controllers
  • Ideas
  • Learning problem decomposition (a) learning of
    the constraints on operators trajectories, (b)
    learning of the systems dynamics
  • Generalized trajectory as a continuous subgoal
  • Symbolic and qualitative constraints, use of
    domain knowledge
  • Differences with related approaches
  • continuous generalized trajectory
  • qualitative strategies

11
Experimental domains
  • Container crane
  • we used execution traces from (Urbancic, 94)
  • Acrobot (DeJong, 95 Sutton, 96)
  • two link pendulum in a graviatational field
    swing-up task
  • Bicycle riding (Randlov, Alstr?m, 98)
  • drive the bike from the start to the goal
    position requires simultaneous balancing
  • and goal-aiming
  • Simulators used in all experiments
  • Measure of success
  • time to accomplish the task

12
Operators trajectory
  • A sequence of the states from an execution trace
  • Path in the state space

Operators trajectory of the trolley velocity
(dX) in the space of X, ? and dX
13
Generalized trajectory
Induced constraints on operators trajectory
  • Constraints can be represented as
  • trees
  • equations
  • qualitative constraints

14
Qualitative and quantitative strategy
  • Quantitative strategy given with precise
    numerical values or numeric constraints (decision
    tree, equation)
  • Qualitative strategy may also use qualitative
    constraints. A qualitative strategy defines a set
    of quantitative strategies
  • We use qualitatively constrained functions
    (QCFs) monotonicity constraints as used in
    qualitative reasoning

15
Qualitatively constrained functions
  • M(x) ? arbitrary monotonically increasing fn. of
    x
  • A QCF is a generalization of M, similar to qual.
    proportionality predicates used in QPT(Forbus,
    84)

Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
16
Problem decomposition
17
Direct and indirect controllers
Original approach, BOXES, ASE/ACE
  • Our approach
  • Also CHURPs(Stirling, 95), GRAIL(Bain and Sammut,
    99), ICM(Camacho, 2000)

18
Robustness of direct and indirect controllers
against learning error
  • Experiment modelling learning of direct and
    indirect controllers with some learning error
  • direct controllers correct action noise(?)
  • indirect controllers correct trajectory
    noise(?)
  • Two error models
  • Gaussian noise
  • Biased Gaussian noise (all errors in the same
    direction)
  • Simple, deterministic, discrete time system
  • Control task reach and maintain the goal value
    Xg
  • Performance criterion controller error in Xg

19
Robustness of direct and indirect controllers
against learning error (2)
Biased noise affects direct controllers much more
20
Possible advantages of indirect controllers
  • Less prone to the departure from the operators
    trajectory
  • More robust against change in the systems
    dynamics and small changes in the task
  • generalizing the trajectory is often easier than
    generalizing the actions
  • Generalized trajectory often easier to understand
    (less details)

21
Symbolic and qualitative skill reconstruction
GoldHorn(Kriman, 98)
LWR(Atkeson et al., 97)
  • Experiments in the crane and acrobot domains

22
Experiments in the crane domain
  • GoldHorn induced the generalized trajectory of
    the trolley velocity
  • dXdes 0.902 0.018 X2 0.090 X 0.050 ?

Qualitative strategy if X ? Xmid then dX
M,(X, ?) else dX M-,(X, ?)
23
Transforming qualitative into quantitative
strategies
  • By concretizing qualitative parameters into real,
    numeric values or real-valued functions
  • First experiment using randomly generated
    functions satisfying qualitative constraints and
    additional domain knowledge
  • maximal and minimal values of the state variables
  • the trolley starts towards goal
  • the trolley stops at goal
  • Second experiment using additional domain
    knowledge

24
Efficiency of the qualitative strategy
  • The results show that qualitative strategy is
  • general (the proper selection of qualitative
    parameters is not crucial)
  • successful offers the space for controller
    optimization
  • Similar experiments in acrobot domain

25
Qualitative induction
  • Motivation our experiments with qualitative
    strategies (crane, acrobot)
  • Usual classification learning problem, but
    learning of qualitative trees
  • in leaves are qualitatively constrained functions
    (QCFs) QCFs give constraints on the class change
    in response to a change in attributes
  • internal nodes (splits) define a partition of the
    state space into areas with common qualitative
    behavior of the class variable

26
Qualitatively constrained function (QCF)
  • M(x) ? arbitrary monotonically increasing fn. of
    x
  • A QCF is a generalization of M, similar to qual.
    proportionality predicates used in QPT(Forbus,
    84)

Gas in the container Pres c Temp / Vol , c n
R gt 0
QCF Pres M,-(Temp,Vol)
Tempstd Vol ? ? Pres ? Temp ? Vol ? ?
Pres ? Temp ? Vol ? ? Pres ?
Temp ? Vol ? ? Pres ? Temp ? Vol ?
? Pres ?
27
Learning QCFs
Pres 2 Temp / Vol Temp Vol Pres 315.00
56.00 11.25 315.00 62.00 10.16 330.00 50.00
13.20 300.00 50.00 12.00 300.00 55.00 10.90
  • Learning of the most consitent QCF
  • For each pair of examples form a qualitative
    change vector
  • Select the QCF with minimal error-cost

28
Learning QCFs
QCF Incons. Amb. M(Temp)
M-(Temp) M(Vol) M-(Vol) M,(Temp,
Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol)
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) M(Vol) M-(Vol) M,(Tem
p,Vol) M,-(Temp,Vol) M-,(Temp,Vol) M-,-(Temp,Vol
)
QCF Incons. Amb. M(Temp)
3 1 M-(Temp) 2,4
1 M(Vol) 1,2,3 / M-(Vol)
4 / M,(Temp,Vol) 1,3
2 M,-(Temp,Vol) /
3,4 M-,(Temp,Vol) 1,2
3,4 M-,-(Temp,Vol) 4 2
qTempneg qVolneg qPrespos
Select QCF with minimal QCF error-cost
29
Learning qualitative tree
  • For every possible split, split the examples into
    two subsets, find the most consistent QCF for
    both subsets and select the split minimizing
    tree-error cost (based on MDL)
  • Algorithm ep-QUIN uses every pair of examples
  • An improvement heuristic QUIN algorithm that
    considers also locality and consistency of
    qualitative change vectors

30
Experimental evaluation in artificial domains
  • On a set of artificial domains with uniformly
    distributed attributes 2 irrelevant attributes
  • Results by QUIN better than ep-QUIN
  • In simple domains QUIN finds qualitative
    relations corresponding to our intuition

31
QUIN in bicycle riding
  • Control task
  • drive a bike from the start to the goal position
  • the bikes speed is assumed constant
  • difficult because balancing and goal-aiming must
    be performed simultaneously
  • Controlled by torque applied to the handlebars
  • State goalAngle, goalDist, ?, d?, ?, d?
  • QUIN ?des f(State)

32
Induced qualitative strategy
goalAngle
? 0.015
gt 0.015
goalAngle
M,,-(?, d?,goalAngle)
? -0.027
gt -0.027
M,,-(?, d?,goalAngle)
M,(?, d?)
Same QCFs
33
Induced qualitative strategy
goalAngle near zero
yes
no
M,(?, d?)
M,,-(?, d?,goalAngle)
Balancing
Balancing and goal-aiming
Goal-aiming turn the front wheel away from the
goal
If the bike starts falling over then turn the
front wheel in the direction of the fall
34
Transforming qualitative into quantitative
strategies
  • Transform QCFs into real valued functions by
    using simple domain knowledge
  • maximal front wheel deflection
  • drive straight if bike is aiming at the goal
    f(0,0,0)0
  • balancing is more important than aiming at the
    goal
  • 400 randomly generated quantitative strategies
    59.2 successful
  • Test of robustness
  • Change in the start state (58 successful)
  • Random displacement of the bicyclist from the
    mass center (26 successful)

35
QUIN in crane domain
  • Crane control requires trolley and rope control
  • Experiments with traces of 2 operators using
    different control styles
  • Rope control
  • QUIN Ldes f(X, dX, ?, d?, dL)
  • Often very simple strategy induced

Ldes M( X ) bring down the load as the trolley
moves from the start to the goal position
36
Trolley control
  • QUIN dXdes f(X, ?, d?)
  • More diversity in the induced strategies

Enables reconstruction of individual differences
in control styles
X lt 20.7
X lt 29.3
yes
yes
no
no
M(X)
M,,-(X, ?, d?)
X lt 60.1
d? lt -0.02
yes
yes
no
no
M-(X)
M(?)
M-(X)
M-,(X,?)
37
Role of human intervention
  • Approach facilitates the use of user knowledge
  • In our experiments the following types of human
    intervention were used
  • Selection of the dependent trajectory variable
  • Disregarding some state variables
  • Selection and analysis of induced equations
  • Using domain knowledge in transforming
    qualitative into quantitative strategies
  • According to empirical evidence different
    (sensible) choices and use of domain knowledge
    also give successful strategies

38
Contributions of the thesis
  • A decomposition of the behavioural cloning
    problem into the learning of continuous
    generalized trajectory and systems dynamics
  • Modelling of human skill with symbolic and
    qualitative constraints
  • QUIN algorithm for learning qualitative
    constraint trees
  • Applying QUIN to skill reconstruction
  • Experimental evaluation in several dynamic domains

39
Further work
  • Applying QUIN in different domains where
    qualitative models preferred QUIN improvements
  • Qualitative simulation to generate possible
    explanations of a qualitative strategy
  • Reducing the space of admissible controllers by
    qualitative reasoning
  • Minimizing the trajectory constraints error in
    all the state variables would not require the
    selection of the dependent trajectory variable
Write a Comment
User Comments (0)
About PowerShow.com