Representations for Decision Making Under Uncertainty - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Representations for Decision Making Under Uncertainty

Description:

6/23 1530 at(home), email('goal: 6/26 0900 at(Cornell) ... Browsing the web, can't reach Yahoo!. What's wrong? Make it better! 7/12/09 ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 43
Provided by: Michael1849
Category:

less

Transcript and Presenter's Notes

Title: Representations for Decision Making Under Uncertainty


1
Representations for Decision Making Under
Uncertainty
  • Michael L. Littman (Rutgers Univ.)
  • mlittman_at_cs.rutgers.edu

2
Example Travel Domain
  • 6/23 1530 at(home), email(goal 6/26 0900
    at(Cornell))
  • 6/23 1530 wait(6/25 1830)
  • 6/25 1830 go(mycar, home, EWR)
  • 6/25 2030 go(plane, EWR, SYR)
  • 6/25 2130 go(alexcar, SYR, alexhouse)
  • 6/25 2230 wait(6/26 0800)
  • 6/26 0800 go(alexcar2, alexhouse, Cornell)
  • 6/26 0900 at(Cornell)

3
Example Travel Domain
  • 6/23 1530 at(home), email(goal 6/26 0900
    at(Cornell))
  • 6/23 1530 wait(6/25 1830), email(6/25 2130
    at(plane, EWR))
  • 6/25 1830 go(mycar, home, EWR)
  • 6/25 2030 go(plane, EWR, SYR)
  • 6/25 2130 go(alexcar, SYR, alexhouse)
  • 6/25 2230 wait(6/26 0800)
  • 6/26 0800 go(alexcar2, alexhouse, Cornell)
  • 6/26 0900 at(Cornell)

6/25 1830 wait(6/25 1930) 6/25 1930 go(mycar,
home, EWR) 6/25 2130 go(plane, EWR, SYR) 6/25
2230 go(alexcar, SYR, alexhouse) 6/25 2330
wait(6/26 0800) 6/26 0800 go(alexcar2, alexhouse,
Cornell) 6/26 0900 at(Cornell)
4
Example Travel Domain
  • 6/23 1530 at(home), email(goal 6/26 0900
    at(Cornell))
  • 6/23 1530 wait(6/25 1830), email(6/25 2130
    at(plane, EWR))
  • 6/25 1830 wait(6/25 1930)
  • 6/25 1930 go(mycar, home, EWR)
  • 6/25 2130 go(plane, EWR, SYR), action failed
  • 6/25 2230 go(alexcar, SYR, alexhouse)
  • 6/25 2330 wait(6/26 0800)
  • 6/26 0800 go(alexcar2, alexhouse, Cornell)
  • 6/26 0900 at(Cornell)

6/25 2200 go(mycar, EWR, home) 6/25 2230
wait(6/26 0400) 6/26 0400 go(mycar, home,
Cornell) 6/26 0900 at(Cornell)
6/25 2200 go(mycar, EWR, home) 6/25 2230
wait(6/26 0400) 6/26 0400 go(mycar, home,
Cornell) 6/26 0900 at(Cornell)
5
Things to Notice
  • Real-world tasks are uncertain.
  • Need to think ahead, react to surprises.
  • Im tired.
  • Level of description carefully chosen requires
    significant engineering
  • which details to suppress
  • model transition probabilities, costs
  • varies with task

Gap
6
Logical Representation
  • STRIPS belief nets (Littman Younes 03).
  • go(vehicle, from, to)
  • if (at(from) and at(vehicle, from))
  • 0.7 not(at(from)), not(at(vehicle, from)),
    at(to), at(vehicle, to)
  • 0.3 failure
  • else failure
  • First planning competition at ICAPS-04.
  • Progress in algorithm design.
  • Domain engineering a major challenge

7
Predictive State Representation
  • To be learnable, representation based on
    measurable quantities.
  • Can use action-conditional predictions If I put
    my hand in my pocket, will I find my keys there?
    (e.g., Littman, Sutton Singh 02)
  • Some early examples being learned.

8
Instance-based Representation
  • Uses experience directly to serve as a knowledge
    representation. Non-parametric.
  • Has been applied to continuous and partially
    observable environments. Subset nearby.
  • Also memory-based (Moore Atkeson 93
    Santamaría et al. 97 Forbes Andre 00 Smart
    Kaelbling 00 McCallum 95)

joint angle space
9
Optimal Disk Repair Policy
  • Disk online? (Littman, Fenson, Howard, Hirsh
    Nguyen 03)
  • yes Restart software
  • success Done (Software bug, Transient disk
    error)
  • failure Replace disk (Partial disk failure)
  • no Other disk online?
  • yes Operator replug any loose cables
  • success Done (Cable disconnected)
  • failure Operator replug all cables
  • success Done (Cables crossed)
  • failure Replace disk (Permanent disk failure)
  • no Replace SCSI controller (SCSI controller
    failure)

10
Experience-based Knowledge Formation
  • Objective System creates, maintains knowledge
  • Current
  • humans design representation (meaningless)
  • raw data
  • Key Challenges
  • need representation grounded in subjective
    experience self-supervised learning
  • may require sensors and actions

11
Reinforcement Learning Rich Sensors
  • Objective Need lots of sensors for
    understanding use them to guide behavior
  • Current
  • completely observable domains
  • nearly unobservable domains
  • Key Challenges
  • how represent visual, auditory features?
  • deal with continuous parameter values?
  • temporal dependencies? (Instance-based RL?)

12
Robust Analogical Reasoning
  • Objective Use grounded information flexibly in
    reasoning learn more efficiently
  • Current
  • analogies over fragile knowledge reps
  • ground-level reasoning with no analogies
  • Key Challenges
  • how learn what can substitute for what?
  • how do fast matching?
  • active replacement for KR?

13
The (RL) Problem
  • Decision maker interacts with environment.
  • x1, a1, r1, x2, a2, r2, x3, a3, r3
  • Objective maximize long-term total reward.

observations
environment
reward
actions
14
Emerging Applications
  • Monitoring and repair of LANs
  • Network management
  • Power grid restoration
  • Robotics (RoboCup, legged league)
  • Stock trading
  • Active vision

15
Issue of State
  • In most RL research, xt is Markov ( finite).
  • This is quite helpful (Sutton Barto 98)
  • MDP model, efficient planning algs.
  • Depend on value function V(x).
  • Create policy p(x).
  • Polytime learning of near optimal policy (Kearns
    Singh 98, e.g.).
  • Needs many exposures to each state.

16
Some Harder Cases
  • Continuous environments
  • Sensors report continuous values.
  • Sure, Markov, but states dont recur.
  • Non-Markovian environments
  • A.k.a. partially observable environments.
  • Sensors provide incomplete information.
  • Not clear what to use as states.

17
Planning with States
  • Whats all the fuss about states?
  • Just received observation xt. What to do?
  • Given our experience, which action will lead to
    the maximum reward?
  • Estimate rewards and transitions, solve
  • Q(s,a) R(s,a) Ss (T(s,a,s) maxa
    Q(s,a))
  • to choose optimal actions.
  • (Smooth estimates, reward exploration.)

18
Proposal for State 1 Window
  • History window (length k)
  • State at t is xt-k 1,at-k 1, , xt-2, at-2,
    xt-1, at-1, xt.
  • Often appropriate, at least approximately.
  • Cant capture many simple environments.
  • Often scales badly.
  • Learning quite direct (observable).

19
Proposal for State 2 POMDP
  • Partially observable Markov decision process
  • Assert hidden states control dynamics.
  • Track distribution over hidden states.
  • (Belief) states are continuous, cant use Q.
  • Sophisticated algorithms have been devised.
  • Captures more complex environments.
  • Hard to learn hidden is hidden.
  • (Monahan 82 Sondik 71 Cassandra et al. 97
    Chrisman 92)

20
Proposal for State 3 PSR
  • Predictive state representation
  • States defined by predictions of tests.
  • Tests sequences of future history windows.
  • Test outcomes are independently verifiable.
  • Linear update can express POMDPs.
  • Prediction vectors are also continuous states.
  • Learning algorithms under development.
  • Not clear which tests to use in representation.
  • (Littman, Sutton Singh 02 Singh et al. 03)

21
Instance-based Continuous
  • Basic idea For a continuous, Markov xs.
  • Store instances (one per t) in a database.
  • Assign a value to each (TD style).
  • Approximate value of xt using nearby
    instances.
  • Also memory-based (Moore Atkeson 93
    Santamaría et al. 97 Forbes Andre 00 Smart
    Kaelbling 00)

joint angle space
22
Instance Subsets as State
  • Instance-based approaches generally
  • viewed as value function approximation,
  • view state space as continuous (vs. Q).
  • Idea May be fruitful to view as state rep!
  • View instant at time t as matching a subset of
    previously stored instances.
  • Plan using subsets of instances as state.

23
Issues of Instance Subsets
  • Tractable Discrete, directly observable states.
  • Lots of subsets.
  • Most not reachable.
  • Typically (sub?) linear in number of instances.
  • Still big may need careful forgetting.
  • Similarity function often designed by hand.
  • Like clustering, but can be overlapping.
  • Often wildly overestimates values.
  • May encourage exploration if replan.

24
Instance-based Non-Markovian
  • What constitutes a instance?
  • Nearest sequence memory (McCallum 95)
  • Instant t is entire history up to t.
  • Neighbors are top k suffix matches.
  • Unlike history window, context sensitive.
  • Superior to POMDP-learning approach.
  • Scaling issues, but promising.

25
Recap
  • RL when continuous non-Markovian.
  • Exploring instance-based state rep.
  • continuous mountain car
  • non-Markovian fault remediation
  • Focus for the remainder of the talk.
  • Problem area
  • Formal model
  • Algorithmic progress
  • Mini demo

26
Networking Application
  • Browsing the web, cant reach Yahoo!
  • Whats wrong?
  • Make it better!

27
Modeling Assumptions
  • Hard problem simplifying assumptions
  • Fault remediation episodes independent.
  • High level actions provide information or repair.
  • Otherwise, no fault mode changes.
  • Want minimum cost repair.

28
Cost-sensitive Fault Remediation
  • Cost-sensitive diagnosis
  • no state transitions, actions informational only
  • episode ends when a diagnosis made
  • Cost-sensitive fault remediation
  • still no state change, repair is all or nothing
  • episode continues until objective achieved
  • Episodic Partially Observable MDP
  • actions change state and provide information
  • episode continues until objective achieved

29
Conceptual Model
  • Cost-sensitive fault remediation
  • Set of underlying fault modes
  • Actions observe attributes with costs
  • Actions remediate fault modes with costs,
    repairing some, failing on others
  • (Greiner et al. 96, Turney 00, Guo 02, Zubek
    Dietterich 02)

30
Example Observables
31
Example Remedial Actions, Faults
32
Cost-sensitive Classification
  • Decisions what to observe, guess class
  • All classes are remedial.
  • Misclassification cost on par with observation
    cost
  • Too cheap pick most likely class
  • Too expensive observe everything
  • (Turney )

33
CSFR Formal Definition
  • Set of fault modes C observables T remediation
    actions R
  • Fault probability distribution p(c) over c
  • Cost function j(t,c) for observing t in mode c
  • Observation model b(t,c), with probability of
    observing 1 using t in c
  • Remediation costs m(r,c) for taking r in c
  • Goal function G(r,c) returns 1 if r repairs c
  • For every c there is some r such that G(r,c)1.

34
Disk Diagnosis Example
  • Failure modes
  • Permanent disk failure
  • Partial disk failure
  • Transient disk error
  • Software bug
  • Cable disconnected
  • Cables crossed
  • SCSI controller failure
  • Tests, Remedial Actions cost
  • Disk online? 1
  • Disk ok? 600
  • All cables plugged in? 60
  • Other disk online? 1
  • Restart software 30
  • Operator replug any loose cables 70
  • Operator replug all cables 420
  • Replace disk 900
  • Replace SCSI controller 900

35
Disk Diagnosis Example
  • Observations
  • Disk Online?, Read?, Write?, Scan Disk?, Create
    File?
  • Remediation
  • Reboot, Replace (Copied) Disk, Remove Unwanted
    Files

36
Disk Diagnosis Faults
  • Faults MTTF Prob
  • Disk Hang 1yr 0.0400
  • Disk Timeout 5wks 0.4340
  • Read Failure (Bad Sector) Caught 35yrs 0.0001
  • Read Failure (Bad Sector) Not caught 0.0010
  • Write Failure (Bad Sector) Caught 35yrs 0.0001
  • Write Failure (Bad Sector) Not caught 0.0010
  • Disk Full 4wks 0.5230

37
Optimal Policy
  • Disk Online?
  • yes Read?
  • yes Create File?
  • yes Replace Disk (Read Failure (Bad Sector) Not
    Caught, or Write Failure (Bad Sector) Caught, or
    Write Failure (Bad Sector) Not Caught)
  • no Remove Unwanted Files (Disk Full)
  • no Reboot (Disk Timeout)
  • failure Replace Disk (Read Failure (Bad Sector)
    Caught)
  • no Reboot (Disk Hang)

38
Optimal Disk Repair Policy
  • Disk online?
  • yes Restart software
  • success Done (Software bug, Transient disk
    error)
  • failure Replace disk (Partial disk failure)
  • no Other disk online?
  • yes Operator replug any loose cables
  • success Done (Cable disconnected)
  • failure Operator replug all cables
  • success Done (Cables crossed)
  • failure Replace disk (Permanent disk failure)
  • no Replace SCSI controller (SCSI controller
    failure)

39
Things to Notice
  • Different from people (gearing up for tests).
  • Some actions repair multiple faults.
  • Some remedial actions fail.
  • Each fault appears only once.
  • Closed world assumption.

40
Learning in CSFRs
  • Can plan in a CSFR model, from where?
  • Fault modes unknown.
  • Episodes end with successful repair.
  • Necessarily incomplete information.
  • Instance-based approach
  • Instances are the episodes actions, results.
  • Instant matches subset of instances.
  • Use matches to estimate model plan.

41
A DP Planning Algorithm
  • Subset of the set of episodes s, observable
    o, remediation r.
  • Recursively compute
  • Cost of optimal repair
  • V(s) min(mino Q(s,o), minr
    Q(s,r))
  • Cost of repair starting with observation o
  • Q(s,o)c(s,o)Pr(o1s)V(so1)Pr(o0s)V(so0
    )
  • Cost of repair starting with repair r
  • Q(s,r) c(s,r)Pr(r1s) V(sr1)

42
Algorithm Analysis
  • Two important details
  • Choose only observables that provide information
    or have repaired faults.
  • If no information actions left, choose
    remediation.
  • Then, state-space size bounded by number of
    subsets of true fault modes.

43
Algorithmic Limitations
  • Optimal if few faults, (nearly) deterministic
    observables, easily explored.
  • Moving ahead
  • need smart exploration
  • better scaling with many (related) faults
  • handle noisy, continuous observables
  • deal with some side effects

44
Practical Considerations
  • CSFR big improvement over classification.
  • Fault modes analogous to classes, but
    automatically identified autonomous.
  • But, how know if repair succeeded?
  • Primary complaint
  • Monitors
  • Triggers common to both (start with info).

45
Networking Demo Actions
  • Observables
  • Is the network medium physically connected?

  • Is the active interface wireless?

  • Is active interface DHCP-Enabled?

  • Ping my IP?

  • Ping locahost?

  • Ping Gateway?

  • DNS lookup

  • My IP setting looks valid

  • My netmask setting looks valid

  • My DNS setting looks valid

  • My gateway setting looks valid

  • Can I Reach PnP?
  • Can PnP reach DNS?

Remediation Actions Plug your network cable back
in (or restore your wireless connection) Renew
DHCP lease Fix IP Setting Check your router's
physical connection to your ISP Check your
router's physical connection to your LAN Contact
ISP and report that their DNS Server appears to
be down Contact ISP and report that you cannot
reach their DNS Server Contact ISP and report
that you cannot reach pnphome.com
46
Network Demo Notes
  • My gateway setting looks valid.
  • My netmask setting looks valid.
  • Renew DHCP lease.

47
Conclusions
  • Instance-based advantages
  • efficient in experience (expensive commodity)
  • no catastrophic forgetting (rare events!)
  • memory is cheap
  • Instance-based state representation advantages
  • model based planning and exploration explicit
  • state representation with minimal analysis
  • incrementally modifiable (discover structure)
  • reduce to previously researched problem!
Write a Comment
User Comments (0)
About PowerShow.com