Representations for Decision Making Under Uncertainty - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Representations for Decision Making Under Uncertainty

Description:

6/23 1530 at(home), email('goal: 6/26 0900 at(Cornell) ... Browsing the web, can't reach Yahoo!. What's wrong? Make it better! 7/12/09 ... – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 43

Provided by: Michael1849

Category:

more less

Transcript and Presenter's Notes

Title: Representations for Decision Making Under Uncertainty

1
Representations for Decision Making Under
Uncertainty

Michael L. Littman (Rutgers Univ.)
mlittman_at_cs.rutgers.edu

2
Example Travel Domain

6/23 1530 at(home), email(goal 6/26 0900
at(Cornell))
6/23 1530 wait(6/25 1830)
6/25 1830 go(mycar, home, EWR)
6/25 2030 go(plane, EWR, SYR)
6/25 2130 go(alexcar, SYR, alexhouse)
6/25 2230 wait(6/26 0800)
6/26 0800 go(alexcar2, alexhouse, Cornell)
6/26 0900 at(Cornell)

3
Example Travel Domain

6/23 1530 at(home), email(goal 6/26 0900
at(Cornell))
6/23 1530 wait(6/25 1830), email(6/25 2130
at(plane, EWR))
6/25 1830 go(mycar, home, EWR)
6/25 2030 go(plane, EWR, SYR)
6/25 2130 go(alexcar, SYR, alexhouse)
6/25 2230 wait(6/26 0800)
6/26 0800 go(alexcar2, alexhouse, Cornell)
6/26 0900 at(Cornell)

6/25 1830 wait(6/25 1930) 6/25 1930 go(mycar,
home, EWR) 6/25 2130 go(plane, EWR, SYR) 6/25
2230 go(alexcar, SYR, alexhouse) 6/25 2330
wait(6/26 0800) 6/26 0800 go(alexcar2, alexhouse,
Cornell) 6/26 0900 at(Cornell)
4
Example Travel Domain

6/23 1530 at(home), email(goal 6/26 0900
at(Cornell))
6/23 1530 wait(6/25 1830), email(6/25 2130
at(plane, EWR))
6/25 1830 wait(6/25 1930)
6/25 1930 go(mycar, home, EWR)
6/25 2130 go(plane, EWR, SYR), action failed
6/25 2230 go(alexcar, SYR, alexhouse)
6/25 2330 wait(6/26 0800)
6/26 0800 go(alexcar2, alexhouse, Cornell)
6/26 0900 at(Cornell)

6/25 2200 go(mycar, EWR, home) 6/25 2230
wait(6/26 0400) 6/26 0400 go(mycar, home,
Cornell) 6/26 0900 at(Cornell)
6/25 2200 go(mycar, EWR, home) 6/25 2230
wait(6/26 0400) 6/26 0400 go(mycar, home,
Cornell) 6/26 0900 at(Cornell)
5
Things to Notice

Real-world tasks are uncertain.
Need to think ahead, react to surprises.
Im tired.
Level of description carefully chosen requires
significant engineering
which details to suppress
model transition probabilities, costs
varies with task

Gap
6
Logical Representation

STRIPS belief nets (Littman Younes 03).
go(vehicle, from, to)
if (at(from) and at(vehicle, from))
0.7 not(at(from)), not(at(vehicle, from)),
at(to), at(vehicle, to)
0.3 failure
else failure
First planning competition at ICAPS-04.
Progress in algorithm design.
Domain engineering a major challenge

7
Predictive State Representation

To be learnable, representation based on
measurable quantities.
Can use action-conditional predictions If I put
my hand in my pocket, will I find my keys there?
(e.g., Littman, Sutton Singh 02)
Some early examples being learned.

8
Instance-based Representation

Uses experience directly to serve as a knowledge
representation. Non-parametric.
Has been applied to continuous and partially
observable environments. Subset nearby.
Also memory-based (Moore Atkeson 93
Santamaría et al. 97 Forbes Andre 00 Smart
Kaelbling 00 McCallum 95)

joint angle space
9
Optimal Disk Repair Policy

Disk online? (Littman, Fenson, Howard, Hirsh
Nguyen 03)
yes Restart software
success Done (Software bug, Transient disk
error)
failure Replace disk (Partial disk failure)
no Other disk online?
yes Operator replug any loose cables
success Done (Cable disconnected)
failure Operator replug all cables
success Done (Cables crossed)
failure Replace disk (Permanent disk failure)
no Replace SCSI controller (SCSI controller
failure)

10
Experience-based Knowledge Formation

Objective System creates, maintains knowledge
Current
humans design representation (meaningless)
raw data
Key Challenges
need representation grounded in subjective
experience self-supervised learning
may require sensors and actions

11
Reinforcement Learning Rich Sensors

Objective Need lots of sensors for
understanding use them to guide behavior
Current
completely observable domains
nearly unobservable domains
Key Challenges
how represent visual, auditory features?
deal with continuous parameter values?
temporal dependencies? (Instance-based RL?)

12
Robust Analogical Reasoning

Objective Use grounded information flexibly in
reasoning learn more efficiently
Current
analogies over fragile knowledge reps
ground-level reasoning with no analogies
Key Challenges
how learn what can substitute for what?
how do fast matching?
active replacement for KR?

13
The (RL) Problem

Decision maker interacts with environment.
x1, a1, r1, x2, a2, r2, x3, a3, r3
Objective maximize long-term total reward.

observations
environment
reward
actions
14
Emerging Applications

Monitoring and repair of LANs
Network management
Power grid restoration
Robotics (RoboCup, legged league)
Stock trading
Active vision

15
Issue of State

In most RL research, xt is Markov ( finite).
This is quite helpful (Sutton Barto 98)
MDP model, efficient planning algs.
Depend on value function V(x).
Create policy p(x).
Polytime learning of near optimal policy (Kearns
Singh 98, e.g.).
Needs many exposures to each state.

16
Some Harder Cases

Continuous environments
Sensors report continuous values.
Sure, Markov, but states dont recur.
Non-Markovian environments
A.k.a. partially observable environments.
Sensors provide incomplete information.
Not clear what to use as states.

17
Planning with States

Whats all the fuss about states?
Just received observation xt. What to do?
Given our experience, which action will lead to
the maximum reward?
Estimate rewards and transitions, solve
Q(s,a) R(s,a) Ss (T(s,a,s) maxa
Q(s,a))
to choose optimal actions.
(Smooth estimates, reward exploration.)

18
Proposal for State 1 Window

History window (length k)
State at t is xt-k 1,at-k 1, , xt-2, at-2,
xt-1, at-1, xt.
Often appropriate, at least approximately.
Cant capture many simple environments.
Often scales badly.
Learning quite direct (observable).

19
Proposal for State 2 POMDP

Partially observable Markov decision process
Assert hidden states control dynamics.
Track distribution over hidden states.
(Belief) states are continuous, cant use Q.
Sophisticated algorithms have been devised.
Captures more complex environments.
Hard to learn hidden is hidden.
(Monahan 82 Sondik 71 Cassandra et al. 97
Chrisman 92)

20
Proposal for State 3 PSR

Predictive state representation
States defined by predictions of tests.
Tests sequences of future history windows.
Test outcomes are independently verifiable.
Linear update can express POMDPs.
Prediction vectors are also continuous states.
Learning algorithms under development.
Not clear which tests to use in representation.
(Littman, Sutton Singh 02 Singh et al. 03)

21
Instance-based Continuous

Basic idea For a continuous, Markov xs.
Store instances (one per t) in a database.
Assign a value to each (TD style).
Approximate value of xt using nearby
instances.
Also memory-based (Moore Atkeson 93
Santamaría et al. 97 Forbes Andre 00 Smart
Kaelbling 00)

joint angle space
22
Instance Subsets as State

Instance-based approaches generally
viewed as value function approximation,
view state space as continuous (vs. Q).
Idea May be fruitful to view as state rep!
View instant at time t as matching a subset of
previously stored instances.
Plan using subsets of instances as state.

23
Issues of Instance Subsets

Tractable Discrete, directly observable states.
Lots of subsets.
Most not reachable.
Typically (sub?) linear in number of instances.
Still big may need careful forgetting.
Similarity function often designed by hand.
Like clustering, but can be overlapping.
Often wildly overestimates values.
May encourage exploration if replan.

24
Instance-based Non-Markovian

What constitutes a instance?
Nearest sequence memory (McCallum 95)
Instant t is entire history up to t.
Neighbors are top k suffix matches.
Unlike history window, context sensitive.
Superior to POMDP-learning approach.
Scaling issues, but promising.

25
Recap

RL when continuous non-Markovian.
Exploring instance-based state rep.
continuous mountain car
non-Markovian fault remediation
Focus for the remainder of the talk.
Problem area
Formal model
Algorithmic progress
Mini demo

26
Networking Application

Browsing the web, cant reach Yahoo!
Whats wrong?
Make it better!

27
Modeling Assumptions

Hard problem simplifying assumptions
Fault remediation episodes independent.
High level actions provide information or repair.
Otherwise, no fault mode changes.
Want minimum cost repair.

28
Cost-sensitive Fault Remediation

Cost-sensitive diagnosis
no state transitions, actions informational only
episode ends when a diagnosis made

Cost-sensitive fault remediation
still no state change, repair is all or nothing
episode continues until objective achieved

Episodic Partially Observable MDP
actions change state and provide information
episode continues until objective achieved

29
Conceptual Model

Cost-sensitive fault remediation
Set of underlying fault modes
Actions observe attributes with costs
Actions remediate fault modes with costs,
repairing some, failing on others
(Greiner et al. 96, Turney 00, Guo 02, Zubek
Dietterich 02)

30
Example Observables
31
Example Remedial Actions, Faults
32
Cost-sensitive Classification

Decisions what to observe, guess class
All classes are remedial.
Misclassification cost on par with observation
cost
Too cheap pick most likely class
Too expensive observe everything
(Turney )

33
CSFR Formal Definition

Set of fault modes C observables T remediation
actions R
Fault probability distribution p(c) over c
Cost function j(t,c) for observing t in mode c
Observation model b(t,c), with probability of
observing 1 using t in c
Remediation costs m(r,c) for taking r in c
Goal function G(r,c) returns 1 if r repairs c
For every c there is some r such that G(r,c)1.

34
Disk Diagnosis Example

Failure modes
Permanent disk failure
Partial disk failure
Transient disk error
Software bug
Cable disconnected
Cables crossed
SCSI controller failure

Tests, Remedial Actions cost
Disk online? 1
Disk ok? 600
All cables plugged in? 60
Other disk online? 1
Restart software 30
Operator replug any loose cables 70
Operator replug all cables 420
Replace disk 900
Replace SCSI controller 900

35
Disk Diagnosis Example

Observations
Disk Online?, Read?, Write?, Scan Disk?, Create
File?
Remediation
Reboot, Replace (Copied) Disk, Remove Unwanted
Files

36
Disk Diagnosis Faults

Faults MTTF Prob
Disk Hang 1yr 0.0400
Disk Timeout 5wks 0.4340
Read Failure (Bad Sector) Caught 35yrs 0.0001
Read Failure (Bad Sector) Not caught 0.0010
Write Failure (Bad Sector) Caught 35yrs 0.0001
Write Failure (Bad Sector) Not caught 0.0010
Disk Full 4wks 0.5230

37
Optimal Policy

Disk Online?
yes Read?
yes Create File?
yes Replace Disk (Read Failure (Bad Sector) Not
Caught, or Write Failure (Bad Sector) Caught, or
Write Failure (Bad Sector) Not Caught)
no Remove Unwanted Files (Disk Full)
no Reboot (Disk Timeout)
failure Replace Disk (Read Failure (Bad Sector)
Caught)
no Reboot (Disk Hang)

38
Optimal Disk Repair Policy

Disk online?
yes Restart software
success Done (Software bug, Transient disk
error)
failure Replace disk (Partial disk failure)
no Other disk online?
yes Operator replug any loose cables
success Done (Cable disconnected)
failure Operator replug all cables
success Done (Cables crossed)
failure Replace disk (Permanent disk failure)
no Replace SCSI controller (SCSI controller
failure)

39
Things to Notice

Different from people (gearing up for tests).
Some actions repair multiple faults.
Some remedial actions fail.
Each fault appears only once.
Closed world assumption.

40
Learning in CSFRs

Can plan in a CSFR model, from where?
Fault modes unknown.
Episodes end with successful repair.
Necessarily incomplete information.
Instance-based approach
Instances are the episodes actions, results.
Instant matches subset of instances.
Use matches to estimate model plan.

41
A DP Planning Algorithm

Subset of the set of episodes s, observable
o, remediation r.
Recursively compute
Cost of optimal repair
V(s) min(mino Q(s,o), minr
Q(s,r))
Cost of repair starting with observation o
Q(s,o)c(s,o)Pr(o1s)V(so1)Pr(o0s)V(so0
)
Cost of repair starting with repair r
Q(s,r) c(s,r)Pr(r1s) V(sr1)

42
Algorithm Analysis

Two important details
Choose only observables that provide information
or have repaired faults.
If no information actions left, choose
remediation.
Then, state-space size bounded by number of
subsets of true fault modes.

43
Algorithmic Limitations

Optimal if few faults, (nearly) deterministic
observables, easily explored.
Moving ahead
need smart exploration
better scaling with many (related) faults
handle noisy, continuous observables
deal with some side effects

44
Practical Considerations

CSFR big improvement over classification.
Fault modes analogous to classes, but
automatically identified autonomous.
But, how know if repair succeeded?
Primary complaint
Monitors
Triggers common to both (start with info).

45
Networking Demo Actions

Observables
Is the network medium physically connected?
Is the active interface wireless?
Is active interface DHCP-Enabled?
Ping my IP?
Ping locahost?
Ping Gateway?
DNS lookup
My IP setting looks valid
My netmask setting looks valid
My DNS setting looks valid
My gateway setting looks valid
Can I Reach PnP?
Can PnP reach DNS?

Remediation Actions Plug your network cable back
in (or restore your wireless connection) Renew
DHCP lease Fix IP Setting Check your router's
physical connection to your ISP Check your
router's physical connection to your LAN Contact
ISP and report that their DNS Server appears to
be down Contact ISP and report that you cannot
reach their DNS Server Contact ISP and report
that you cannot reach pnphome.com
46
Network Demo Notes