Title: The Metacognitive Loop and the Problem of Brittleness
1The Metacognitive Loop and the Problem of
Brittleness
Michael L. Anderson University of
Maryland www.activelogic.org Joint work with Don
Perlis, Tim Oates, John Grant, Ken Hennacy,
Darsana Josyula, Yuan Chong and Walid Gomaa
2Perturbation Tolerance A Goal for Intelligent
Systems
- A perturbation is any change, whether in the
world or in the system itself, that impacts
performance. - Perturbation tolerance is the ability of a system
to quickly recover from perturbations. - Perturbation intolerance has long been a major
issue for intelligent systems. - The roots of the problem self-ignorance and
brittleness.
3Self-Ignorance
- A typical AI system has no notion of what it is,
or what it is doing, let alone what it should be
doing, or strive to be - So why does it surprise us when systems fail to
do what they ought, and instead blindly follow
their programming over the metaphorical (or
literal) cliff? - DARPA grand challenge vehicle satellite
4Brittleness
- Self-awareness is of limited usefulness without a
capacity for self-alteration. - A perturbation-tolerant system should not only
notice when it isn't behaving how it ought or
achieving what it should, but be able to use this
knowledge to change the way it operates.
5The Metacognitive Loop
- Our approach to this very general problem has
been to equip artificial agents with the ability
to notice when something is amiss, assess the
anomaly, and guide a solution into place. - Because this basic strategy involves monitoring,
reasoning about, and perhaps even altering ones
own decision-making components, it is a
metacognitive strategy, and we call the basic
Note-Assess-Guide process the Metacognitive Loop
(MCL).
6Self-monitoring
- Self-monitoring for anomalies, assessing, and
responding to those anomalies is a better, more
efficient, and ultimately more effective approach
to perturbation tolerance than is doing nothing,
on the one hand, or trying to continually monitor
and model the world, on the other. - Why?
7Self-monitoring (2)
- The world is huge the system is small.
- If the world changes, but this change does not
affect performance, who cares? - Anomalies can help focus attention on which parts
of the world need (re-)modeling, maiking modeling
more tractable.
8Learning
- We believe that efforts should be aimed at
implementing mechanisms that help systems help
themselves. The goal should be to increase their
agency and freedom of action in responding to
problems, instead of limiting it and hoping that
circumstances do not stray from the anticipations
of the system designer. - Why?
9Learning (2)
- Primarily because we dont think system designers
are smart enough to anticipate every eventuality. - But also because we think that self-aware,
self-guided learning is the foundation of
autonomy. - Metacognitive learners would be advanced active
learners, able to decide what, when, and how to
learn (and when to stop).
10Applications
In our ongoing work, we have found that
including an MCL component can enhance the
performance ofand speed learning indifferent
types of systems, including reinforcement
learners, natural language human-computer
interfaces, commonsense reasoners,
deadline-coupled planning systems, robot
navigation, and, more generally, repairing
arbitrary direct contradictions in a knowledge
base
11MCL Application 1 Active Logic
- Active Logic (AL) is a time-sensitive,
contradiction-tolerant logical formalism for use
by autonomous cognitive agents. - Central to AL are special rules controlling the
inheritance of beliefs in general, and beliefs
about the current time in particular, very tight
controls on what can be derived from direct
contradictions (P P), and mechanisms allowing
an agent to represent and reason about its own
beliefs and past reasoning.
12MCL Application 1 Active Logic
t Now(t) ----------------- t1
Now(t1) t P, P -----------------------
--- t1 Contra(t , P , P)
13MCL Application 1 Active Logic
- Essentially, AL continually watches the KB for
anomalies, in the form of contradictions. - When a contradiction is noticed, the system can
begin reasoning to deal with the contradiction,
including disinheriting premises, looking for
more information, etc. - AL has been used in several applications.
14MCL Application 1 Active Logic
- We have been making progress on a semantics for
AL, that tries to do justice to the fact that
real agents - Exist in time
- Have a constantly evolving KB, all the
consequences of which they do not yet know - Inevitably face contradictions
- The trouble is, when one has a contradictory KB,
it cannot be modeled in the classical sense.
15MCL Application 1 Active Logic
- To see what sort of model makes sense here, we
ask What must the world seem like to the
agent, instead of What must the world be like
if the KB were true - If the KB contains only P, P?Q, Q, the agent
has not yet noticed that this is contradictory. - The agent knows that P, and knows P implies
something, but does not know what it implies.
Thus, the the Q in Q and the Q in P?Q, are
not (yet) seen as the same formula. - We say they are superscripted P1, P1?Q1, Q2
16MCL Application 1 Active Logic
- We have worked out a definition of model based on
these ideas that allows us to define a relevant
notion of soundness, such that - When reasoning with consistent premises, all
classically sound rules are sound for active
logic. - However, not everything that is classically sound
remains sound in our sense, for by classical
definitions, all rules with contradictory
premises are vacuously sound, whereas in active
logic not everything follows from a contradiction.
17MCL Application 2 ALFRED
- ALFRED is a domain-independent natural-language
based HCI system. It is built using active
logic. - ALFRED represents its beliefs, desires,
intentions and expectations, and the status of
each. It tracks the history of its own reasoning. - If ALFRED is unable to achieve something,
something is taking too long, or an expectation
is not met, it assesses this problem, and takes
one of several corrective actions, such as trying
to learn or correcting an error in reasoning.
18MCL Application 2 ALFRED
Example 1 User Send the Boston train to
Atlanta. Alfred OK. ALFRED chooses a train
(train1) in Boston and sends it to Atlanta User
No, send the Boston train to Atlanta. Alfred
OK. ALFRED recalls train1, but also notices an
apparent contradiction dont send train1, do
send train1. ALFRED considers possible causes of
this contradiction, and decides the problem is
his faulty interpretation of the Boston train
as train1. He chooses train2, also at Boston, and
sends it to Atlanta
19MCL Application 2 ALFRED
- Example 2
- User Shomax Troy
- Alfred I do not know the command shomax. What
does shomax mean? - User Shomax means play the movie and maximize
the screen. - Alfred OK.
- shows Troy in fullscreen mode.
- Alfred gets user feedback to learn the new
command Shomax
20MCL Application 2 ALFRED
Example 3 User (600 am) Send B to B Alfred I
do not know the train B. User B is
Bullet Alfred I do not know the city B. User B
is Baltimore Alfred OK. sends the instruction
send Bullet to Baltimore to the TOS User
(601 am) Send B to Richmond Alfred OK. sends
the instruction send Bullet to Richmond to the
TOS
21MCL Application 3 Navigation
- Robby is a simulated khepera robot with a hybrid
reasoner a neural net with primary navigational
control, and a logical reasoner for
self-monitoring. - When Robby has a navigational failure (e.g. a
collision) the reasoner notices, assesses the
failure, and any pattern of failures, and can
instruct the net to retrain on a specific set of
inputs. - Robby exhibits more sensible behavior during
training, and learns to navigate more quickly.
22MCL Application 4 Learning
- Chippy is a reinforcement learner (Q-learning,
SARSA, and Prioritized Sweeping), who learns an
action policy in a reward-yielding state space. - He maintains expectations for rewards, and
monitors his performance (average reward, average
time between rewards). - If his experience deviates from his expectations
(a performance anomaly that we cause by changing
the state space) he assesses the anomaly and
chooses from a range of responses.
23Comparison of the per-turn performance of non-MCL
and simple-MCL with a degree 8 perturbation from
10,-10 to -10,10 in turn 10,001.
24(No Transcript)
25MCL Application 4 Learning
26Future Work Bolo
- Future work will focus on building systems with
robust MCL in more sophisticated, dynamic
environments. Possible applications include - Autonomous search-and-rescue or supply vehicles
- Decision-support reasoning systems
- Multiple-domain human-computer interfaces
27Future Work Bolo
- Bolo is a tank game. Its really hard.
- For a first step, we will be implementing a
search-and-rescue scenario within Bolo. - The tank will have to find all the pillboxes and
bring them to a safe location. - However, it will encounter unexpected
perturbations along the way moved pillboxes,
changed terrain, and shooting pillboxes.
28Future Work Bolo
- It will use an typical 3-tier architecture
reactive, deliberative, and reflective. - However, our middle tier contains (only)
flexible, learning components.
Oversight (MCL)
Trainer Modules
Inference Engine
???
Trainer Modules
Trainable Modules
Trainable Modules
KB
Traditional and Symbolic
29(No Transcript)
30Some Relevant Publications
- Logic, self-awareness and self-improvement The
metacognitive loop and the problem of
brittleness. Michael L. Anderson and Donald R.
Perlis. Journal of Logic and Computation, 15(1),
2005. - The roots of self-awareness. Michael L. Anderson
and Don Perlis. Phenomenology and the Cognitive
Sciences, 4(3), 2005 (in press). - On the reasoning of real-world agents Toward a
semantics for active logic. Michael L. Anderson
Walid Gomaa, John Grant and Don Perlis.
Proceedings of the 7th Annual Symposium on the
Logical Formalization of Commonseense Reasoning,
Dresden University Technical Report (ISSN
1430-211X), 2005.