How a Modeler - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

How a Modeler

Description:

Information in Target Window is only available after waiting for a lockout time ... A new utility learning mechanism. Paper presented at the 2006 ACT-R workshop. ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 28

Provided by: uclic

Category:

more less

Transcript and Presenter's Notes

Title: How a Modeler

1
How a Modelers Conception of Rewards Influences
a Models behavior

Investigating ACT-R 6s utility learning mechanism

Christian P. Janssen
Wayne D. Gray
Michael J. Schoelles

2
Temporal difference learning ACT-R

Temporal difference learning has recently been
introduced as ACT-Rs new utility learning
mechanism (e.g., Fu Anderson, 2004 Anderson,
2006, 2007 Bothell, 2005)
Utility learning learns to optimize behavior as
to maximize the rewards that the model receives
A model can
Receive rewards at different moments in times
Receive rewards of different magnitudes
There are no guidelines for choosing when a
reward should be given and what its magnitude
should be

3
New issues for ACT-R

We studied two aspects of TD learning
When is reward given
Magnitude of the reward
This a new issue for ACT-R
When is reward given could be varied in ACT-R 5
Magnitude of reward could not be varied in ACT-R
5
As we will show, the modelers conception of
rewards has a big influence on a models behavior
Case study Blocks World task (Gray et al., 2006)

4
Why the Blocks World task?

Previous work indicates that the utility learning
mechanism is crucial for this task
ACT-R 5 models (Gray, Sims, Schoelles, 2005)
Regular ACT-R 5 can not provide a good fit to the
human data
Because rewards in ACT-R 5 are binary (i.e.,
successes and failures) and not scalar
Ideal Performer Model (Gray et al., 2006)
Model outside of ACT-R that uses temporal
difference learning provided a very good fit
(Gray et al., 2006)

5
Blocks World task

So whats the task?

6
Blocks World task
Task Copy pattern in target window by moving
blocks from resource window to workspace window
7
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
8
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
9
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
10
Blocks World task
Windows are covered with gray rectanglesAccessin
g information requires interaction with the
interface
11
Blocks World task

Blocks world task
Information in Target Window is only available
after waiting for a lockout time
0, 400 or 3200 milliseconds (between subjects)

12
Blocks World task human data (Gray et al., 2006)

Size of lockout time influences human behavior

13
Blocks World task Modeling Strategies

Strategy How many blocks do you plan to place
after a visit to the target window?
8 encode-x production rules
study x blocks
Encode-1 till encode-8
Model learns utility value of each production
rule using ACT-Rs temporal difference learning
algorithm

14
Utility learning

Utility learning requires the incorporation of
rewards
Two choices are crucial
When is the reward is given?
What is the magnitude of the reward?
After some experience, the utility of a
production rule approximates (Anderson, 2007)

Magnitude
When is reward given
15
Utility learning

Choice 1 When is the reward given?
Important because
Utility value has a linear relationship with the
the time at which the reward is given
Choice in Blocks World
Once model Update once, at the end of the trial
Each model Update each time that part of the
task is completed.
A (set of) block(s) has been placed and the model
either returns to the target window to study more
blocks, or finishes the trial

16
Utility learning

Choice 2 magnitude of the reward
Important because
Utility value has a linear relationship with the
magnitude of the reward
But how to set this value?
Experimental tweaking? -gt unfavorable
Fixed range of values? (e.g., between 0 and 1) -gt
difficult
Relate to neurological data? -gt not available for
most models

17
Utility learning

Choice 2 magnitude of the reward
Choice in Blocks World
Relate the reward to what might be important in
the task
Accuracy Accuracy with which task is
performedOptions
Success blocks placed (once)
Success blocks placed (each)
Success Failure blocks placed - blocks
forgotten (each model)
Time How much time does (part of the) task
take?Options
Time spend on the task -1 time spend (once)
Time spend waiting for specific aspect of the
task -1 lockout size number of visits to
target window (once)
Number of blocks placed per second (each)

18
Blocks World task Modeling Strategies

6 models were developed
Each model is run 6 times for each of 3
experimental conditions
0, 400 and 3200 milliseconds
Models interact with the same interface as human
participants

19
Blocks World task general results

Each model has unique results

20
Blocks World task general results

What is the impact of
When the reward is given (once/each)
The concept of the reward (related to
accuracy/time)
Results averaged over 3 models

21
Utility learning impact of when reward is given
22
Utility learning impact of concept of reward
23
Comparison with ACT-R 5 (Gray, Sims Schoelles,
2005)
24
Conclusion

Rewards can be given at different times during a
trial and according to different concepts
There are no guidelines what the best choices are
Blocks World suggests that rewards should
Be given once Model can optimize behavior over
entire task
Relate to concept of time because different
strategy choices have a big impact on reward size
Models of other tasks should point out if this is
consistent

25
Conclusion

This is not just a Blocks World issue
General Computer Science / AI issue
representing a task in the right way is
crucial(e.g., Russell Norvig, 1995 Sutton
Barto, 1998)
Many experiments involve manipulations and
measurements of accuracy and speed of performance
This a new issue for ACT-R
When is reward given could be varied in ACT-R 5
Magnitude of reward could not be varied in ACT-R
5

26
Thank you for your attention

Questions?
More information
cjanssen_at_ai.rug.nl
www.ai.rug.nl/cjanssen
www.cogsci.rpi.edu/cogworks
Poster Session _at_ CogSci 2008 Thursday, July
24th Cognitive Models of Strategy Shifts in
Interactive Behavior(session Attention and
Implicit Learning)

27
References

Anderson, J. R. (2006). A new utility learning
mechanism. Paper presented at the 2006 ACT-R
workshop.
Anderson, J. R. (2007). How can the human mind
occur in the physical universe? New York Oxford
University Press.
Bothell, D. (2005). ACT-R 6 Official Release.
Proceedings of the 12th ACT-R Workshop.
Fu, W. T., Anderson, J. R. (2004). Extending
the computational abilities of the procedural
learning mechanism in ACT-R. Proceedings of the
26th annual meeting of the Cognitive Science
Society, 416-421.
Gray, W. D., Schoelles, M. J., Sims, C. R.
(2005). Adapting to the task environment
Explorations in expected value. Cognitive Systems
Research, 6(1), 27-40.
Gray, W. D., Sims, C. R., Fu, W. T., Schoelles,
M. J. (2006). The soft constraints hypothesis A
rational analysis approach to resource allocation
for interactive behavior. Psychological Review,
113(3), 461-482.
Russell, S. J., Norvig, P. (1995). Artificial
intelligence a modern approach. Upper Saddle
River, NJ Prentice-Hall, Inc.
Sutton, R. S., Barto, A. G. (1998).
Reinforcement learning An introduction.
Cambridge, MA MIT Press.