RELATIONAL REINFORCEMENT LEARNING - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

RELATIONAL REINFORCEMENT LEARNING

Description:

Action selection is done with the following (Q-exploration) P-Learning ... Deictic Representations (Finney 2002) no FOL, variables like. the-pen-in-my-hand. References ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 34
Provided by: tf28
Category:

less

Transcript and Presenter's Notes

Title: RELATIONAL REINFORCEMENT LEARNING


1
RELATIONAL REINFORCEMENT LEARNING
  • PLL Seminar - Talk 2
  • Tayfun Gürel

2
Overview
  • Reinforcement Learning Background
  • Need for Relational Representations
  • Logical Decision Trees - TILDE
  • Q-RRL integration of RL with TILDE
  • P-RRL
  • Experimental Results
  • Final Discussion and Conclusion

3
The standard reinforcement-learning model
i input r reward s state
a action
4
MDP EXAMPLE
Bellman Equation
States and rewards
(Greedy policy selection)
Transition function
5
Value Iteration Algorithm
AN ALTERNATIVE ITERATION (Singh,1993)
(Important for Q-learning)
6
Q-Learning (Model Free)
Action selection is done with the following
(Q-exploration)
7
P-Learning
  • Q-function encodes the distance from the goal
  • Instead
  • Code whether an (s,a) pair is optimal
  • Use the P-exploration strategy

8
Relational Representations
  • In most applications the state space is too
  • large.
  • Generalization over states is essential
  • Many states are similar for some aspects
  • Representations for RL have to be enriched for
    generalization
  • ?RRL was initiated (Dzeroski, DeRaedt,
    Blockeel 1998)

9
Blocks World
An action move(a,b) precondition clear(a) ? S
clear(b) ? S
s1clear(b), clear(a), on(b,c), on(c,floor),
on(a,floor)
10
Relational Representations
  • With Relational Representations for states
  • Abstraction from details
  • q_value(0.72) goal_unstack ,
    numberofblocks(A) ,
  • action_move(B,C),
    height(D,E), E2, on(C,D),!.
  • Flexible to goal changes
  • Retraining from the beginning is not
  • necessary
  • Transfer of experience to more complex domains

11
Relational Reinforcement Learning
  • How does it work?
  • An integration of RF with ILP
  • Do forever
  • Use Q-learning to generate sample Q
    values
  • for sample states action pairs
  • Generalize them using ILP (in this case
    TILDE)

12
TILDE (Top Down Induction of Logical
Decision Trees )
  • A generalization of Q/P values is represented by
    a logical decision tree
  • Logical Decision Trees
  • Nodes are First Order Logic atoms (Prolog Queries
    as Tests ) (e.g. on (A , c) is there any block
    on c)
  • Training Data is a relational database or a
    Prolog knowledge base

13
Logical Decision Tree vs. Decision Tree
Decision Tree and Logical
Decision Tree deciding whether
blocks are stacked
14
TILDE Algorithm
Declarative bias e.g. on(,-)
Background knowledge A prolog program An
example part of the program can be
15
TILDE Refinement Operators
  • How to find all possible Tests for a node?
  • Use 1. Declarative Bias
  • 2. Tests for the previous nodes
  • Note Predicates from background knowledge
    should be declared in declarative bias.

16
Finding possible tests
Assuming only mode decleration on(,-)
on(A,B)
  • Refinement operator generates
  • on(A,B),on(A,C)
  • on(A,B),on(B,C)

yes
no
on(B,C)
unstacked
no
yes
unstacked
?
Refinement operator generates 1. on(A,B),
on(B,C), on(A,D) 2. on(A,B), on(B,C), on(B,D) 3.
on(A,B), on(A,C), on(C,D)
17
Q-RRL Algorithm
Examples generated by Q- learning
18
Logical regression tree generated by
TILDE-RT
Equivalent prolog program
19
P-RRL algorithm
  • Idea P function may be easier
  • to learn
  • Q-function encodes the distance from the
  • goal. What about if the number of blocks
  • is changed?

20
P-RRL
Together of Induction of Q-Tree, perform
Induction of P-tree
Use P Exploration strategy
21
A P-RRL Result
Examples generated by P-RRL in an episode
A Decision Tree generated by TILDE from these
examples
22
EXPERIMENTS
  • Tested for three different goals
  • 1. one-stack
  • 2. on(a,b)
  • 3. unstack
  • Tested for the following as well
  • 1. Fixed number of blocks
  • 2. Number of blocks changed after learning
  • 3. Number of blocks changed while learning
  • P-RRL vs. Q-RRL

23
Results Fixed number of blocks
Accuracy percentage of correctly classified
(s,a) pairs (optimal-non-optimal)
Accuracy of random policies
24
Results Fixed number of blocks
25
Results Evaluating learned policies on varying
number of blocks
26
Results Varying the number of blocks while
learning
STRANGE!
27
Results Varying the number of blocks while
learning
After adding additional background knowledge
28
Conclusion
  • RRL has satisfying initial results, needs more
    research
  • P-RRL is more successful when number of blocks is
    increased (generalizes better to more complex
    domains)
  • RRL works not very well for more complex goals
    and for great number of blocks

29
Discussion
  • Theoretical research proving why it works is
    still missing
  • LOMDP was introduced (Kersting De Raedt 2003)
  • LOMDP
  • - A logical alphabet
  • - A set of abstract states (FO conjunctions)
  • - Abstract Actions and Transition functions

30
Discussion
  • It is proven that for every LOMDP and
  • and abstract policy, there is a MDP and a
    policy.
  • Still open Why does RRL work?
  • Why does the optimal policy found for LOMDP
    also work for corresponding
  • MDP?

31
Other RRL approaches
  • Symbolic Dynamic Programming
  • ( Reiter, 2001)
  • Combination of Dynamic Programming
  • with Situation Calculus
  • Deictic Representations (Finney 2002)
  • no FOL, variables like
  • the-pen-in-my-hand

32
References
  • S. Dzeroski, L. De Raedt, K. Driessens.
    Relational Reinforcement Learning. Machine
    Learning 43(1/2) 7-52 (2001)
  • K. Kersting, L. De Raedt. Logical Markov Decision
    Programs. In L. Machine Learning and Natural
    Language Processing Lab (p13 of 13) Getoor and
    D. Jensen, editors, Working Notes of the
    IJCAI-2003 Workshop on Learning Statistical
    Models from Relational Data (SRL-03), pp. 63-70,
    August 11, Acapulco, Mexico, 2003.
  • M. van Otterlo. Relational Representations in
    Reinforcement Learning Review and Open
    Problems Proceedings of the ICML'02
  • Workshop on Development of Representations,
    2002.

33
References
  • Leslie Pack Kaelbling, Michael L. Littman, and
    Andrew W. Moore.Reinforcement learning A survey.
    Journal of Artificial Intelligence Research, vol.
    4, pp. 237--285, 1996.
  • Dzeroski, S., De Raedt, L. and Blockeel, H.
    Relational reinforcement learning, In Page, D.
    (Ed.) Proceedings of the 8 th International
    Conference on Inductive Logic Programming,
    Lecture Notes in Artificial Intelligence, Vol.
    1446, Springer, 1998.
Write a Comment
User Comments (0)
About PowerShow.com