RELATIONAL REINFORCEMENT LEARNING - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

RELATIONAL REINFORCEMENT LEARNING

Description:

Action selection is done with the following (Q-exploration) P-Learning ... Deictic Representations (Finney 2002) no FOL, variables like. the-pen-in-my-hand. References ... – PowerPoint PPT presentation

Number of Views:112

Avg rating:3.0/5.0

Slides: 34

Provided by: tf28

Category:

more less

Transcript and Presenter's Notes

Title: RELATIONAL REINFORCEMENT LEARNING

1
RELATIONAL REINFORCEMENT LEARNING

PLL Seminar - Talk 2
Tayfun Gürel

2
Overview

Reinforcement Learning Background
Need for Relational Representations
Logical Decision Trees - TILDE
Q-RRL integration of RL with TILDE
P-RRL
Experimental Results
Final Discussion and Conclusion

3
The standard reinforcement-learning model
i input r reward s state
a action
4
MDP EXAMPLE
Bellman Equation
States and rewards
(Greedy policy selection)
Transition function
5
Value Iteration Algorithm
AN ALTERNATIVE ITERATION (Singh,1993)
(Important for Q-learning)
6
Q-Learning (Model Free)
Action selection is done with the following
(Q-exploration)
7
P-Learning

Q-function encodes the distance from the goal
Instead
Code whether an (s,a) pair is optimal
Use the P-exploration strategy

8
Relational Representations

In most applications the state space is too
large.
Generalization over states is essential
Many states are similar for some aspects
Representations for RL have to be enriched for
generalization
?RRL was initiated (Dzeroski, DeRaedt,
Blockeel 1998)

9
Blocks World
An action move(a,b) precondition clear(a) ? S
clear(b) ? S
s1clear(b), clear(a), on(b,c), on(c,floor),
on(a,floor)
10
Relational Representations

With Relational Representations for states
Abstraction from details
q_value(0.72) goal_unstack ,
numberofblocks(A) ,
action_move(B,C),
height(D,E), E2, on(C,D),!.
Flexible to goal changes
Retraining from the beginning is not
necessary
Transfer of experience to more complex domains

11
Relational Reinforcement Learning

How does it work?
An integration of RF with ILP
Do forever
Use Q-learning to generate sample Q
values
for sample states action pairs
Generalize them using ILP (in this case
TILDE)

12
TILDE (Top Down Induction of Logical
Decision Trees )

A generalization of Q/P values is represented by
a logical decision tree
Logical Decision Trees
Nodes are First Order Logic atoms (Prolog Queries
as Tests ) (e.g. on (A , c) is there any block
on c)
Training Data is a relational database or a
Prolog knowledge base

13
Logical Decision Tree vs. Decision Tree
Decision Tree and Logical
Decision Tree deciding whether
blocks are stacked
14
TILDE Algorithm
Declarative bias e.g. on(,-)
Background knowledge A prolog program An
example part of the program can be
15
TILDE Refinement Operators

How to find all possible Tests for a node?
Use 1. Declarative Bias
2. Tests for the previous nodes
Note Predicates from background knowledge
should be declared in declarative bias.

16
Finding possible tests
Assuming only mode decleration on(,-)
on(A,B)

Refinement operator generates
on(A,B),on(A,C)
on(A,B),on(B,C)

yes
no
on(B,C)
unstacked
no
yes
unstacked
?
Refinement operator generates 1. on(A,B),
on(B,C), on(A,D) 2. on(A,B), on(B,C), on(B,D) 3.
on(A,B), on(A,C), on(C,D)
17
Q-RRL Algorithm
Examples generated by Q- learning
18
Logical regression tree generated by
TILDE-RT
Equivalent prolog program
19
P-RRL algorithm

Idea P function may be easier
to learn
Q-function encodes the distance from the
goal. What about if the number of blocks
is changed?

20
P-RRL
Together of Induction of Q-Tree, perform
Induction of P-tree
Use P Exploration strategy
21
A P-RRL Result
Examples generated by P-RRL in an episode
A Decision Tree generated by TILDE from these
examples
22
EXPERIMENTS

Tested for three different goals
1. one-stack
2. on(a,b)
3. unstack
Tested for the following as well
1. Fixed number of blocks
2. Number of blocks changed after learning
3. Number of blocks changed while learning
P-RRL vs. Q-RRL

23
Results Fixed number of blocks
Accuracy percentage of correctly classified
(s,a) pairs (optimal-non-optimal)
Accuracy of random policies
24
Results Fixed number of blocks
25
Results Evaluating learned policies on varying
number of blocks
26
Results Varying the number of blocks while
learning
STRANGE!
27
Results Varying the number of blocks while
learning
After adding additional background knowledge
28
Conclusion

RRL has satisfying initial results, needs more
research
P-RRL is more successful when number of blocks is
increased (generalizes better to more complex
domains)
RRL works not very well for more complex goals
and for great number of blocks

29
Discussion

Theoretical research proving why it works is
still missing
LOMDP was introduced (Kersting De Raedt 2003)
LOMDP
- A logical alphabet
- A set of abstract states (FO conjunctions)
- Abstract Actions and Transition functions

30
Discussion

It is proven that for every LOMDP and
and abstract policy, there is a MDP and a
policy.
Still open Why does RRL work?
Why does the optimal policy found for LOMDP
also work for corresponding
MDP?

31
Other RRL approaches

Symbolic Dynamic Programming
( Reiter, 2001)
Combination of Dynamic Programming
with Situation Calculus
Deictic Representations (Finney 2002)
no FOL, variables like
the-pen-in-my-hand

32
References

S. Dzeroski, L. De Raedt, K. Driessens.
Relational Reinforcement Learning. Machine
Learning 43(1/2) 7-52 (2001)
K. Kersting, L. De Raedt. Logical Markov Decision
Programs. In L. Machine Learning and Natural
Language Processing Lab (p13 of 13) Getoor and
D. Jensen, editors, Working Notes of the
IJCAI-2003 Workshop on Learning Statistical
Models from Relational Data (SRL-03), pp. 63-70,
August 11, Acapulco, Mexico, 2003.
M. van Otterlo. Relational Representations in
Reinforcement Learning Review and Open
Problems Proceedings of the ICML'02
Workshop on Development of Representations,
2002.

33
References

Leslie Pack Kaelbling, Michael L. Littman, and
Andrew W. Moore.Reinforcement learning A survey.
Journal of Artificial Intelligence Research, vol.
4, pp. 237--285, 1996.
Dzeroski, S., De Raedt, L. and Blockeel, H.
Relational reinforcement learning, In Page, D.
(Ed.) Proceedings of the 8 th International
Conference on Inductive Logic Programming,
Lecture Notes in Artificial Intelligence, Vol.
1446, Springer, 1998.