Title: Thinking Cap 1 Declared a resounding success!
1Thinking Cap 1 Declared a resounding success!
1/27
26 Comments!
16 Unique students
2Administrative
- Final class tally
- Total 43
- CSE 471 31 72 5 junior 26 senior
- CSE 598 12 28 3 PhD, 9 MS
- Grader for the class
- Xin Sun (took CSE 471 in Fall 2007)
- Will work with Yunsong Meng
3(Model-based reflex agents)
How do we write agent programs for these?
4Even basic survival needs state information..
This one already assumes that the
sensors?features mapping has been done!
5(aka Model-based Reflex Agents)
State Estimation
EXPLICIT MODELS OF THE ENVIRONMENT
--Blackbox models --Factored models
?Logical models
?Probabilistic models
6State Estimation
Search/ Planning
It is not always obvious what action to do now
given a set of goals You woke up in the
morning. You want to attend a class. What should
your action be? ? Search (Find a path
from the current state to goal state execute the
first op) ?Planning (does the same for
structurednon-blackbox state models)
7Representation Mechanisms Logic (propositional
first order) Probabilistic logic
Learning the models
Search Blind, Informed Planning Inference
Logical resolution Bayesian inference
How the course topics stack up
8..certain inalienable rightslife, liberty and
pursuit of
?Money ?Daytime
TV ?Happiness
(utility)
--Decision Theoretic Planning --Sequential
Decision Problems
9Discounting
- The decision-theoretic agent often needs to
assess the utility of sequences of states (also
called behaviors). - One technical problem is How do keep the utility
of an infinite sequence finite? - A closely related real problem is how do we
combine the utility of a future state with that
of a current state (how does 15 tomorrow compare
with 5000 when you retire?) - The way both are handled is to have a discount
factor r (0ltrlt1) and multiply the utility of nth
state by rn - r0 U(so) r1 U(s1). rn U(sn)
- Guaranteed to converge since power series
converge for 0ltrltn - r is set by the individual agents based on how
they think future rewards stack up to the current
ones - An agent that expects to live longer may consider
a larger r than one that expects to live shorter
10Learning
Dimensions What can be learned? --Any of
the boxes representing the agents
knowledge --action description, effect
probabilities, causal relations in the
world (and the probabilities of
causation), utility models (sort of through
credit assignment), sensor data
interpretation models What feedback is
available? --Supervised, unsupervised,
reinforcement learning --Credit
assignment problem What prior knowledge is
available? -- Tabularasa (agents head is
a blank slate) or pre-existing knowledge
11Problem Solving Agents (Search-based Agents)
12(No Transcript)
13(No Transcript)
14The important difference from the graph-search
scenario you learned in CSE 310 is that you
want to keep the graph implicit rather than
explicit (i.e., generate only that part of
the graph that is absolutely needed to get the
optimal path) ? VERY important since for
most problems, the graphs are
ginormous tending to infinite..
15(No Transcript)
16(No Transcript)
17What happens when the domain Is inaccessible?
18Notice that actions can sometimes Reduce
state- uncertainty
Space of belief-states is exponentially larger
than space of states. If you throw in likelihood
of states in a belief state the resulting
state-space is infinite!
Sensing reduces State Uncertainty
Search in Multi-state (inaccessible) version
Set of states is Called a Belief State So we
are searching in the space of belief states
19Will we really need to handle multiple-state
problems?
- Cant we just buy better cameras? so our agents
can always tell what state they are in? - It is not just a question of having good pair or
eyes.. Otherwise, why do malls have the maps of
the malls with here you are annotation in the
map? - The problem of localizing yourself in a map is a
non-trivial one..
20State-spaces with Non-deterministic actions
correspond to hyper-graphs
But can be made graphs in The belief space
a3
a2
S1,S3,S4
a3
S1,S3,S4,S2
s1
s2
S2
a1
a1
a1
a2
S5
s5
s4
s3
Solution If in s4 do a2 if in s2 do a3
if in s2 do a1
21Medicate without killing..
(A)
Radiate
- A healthy (and alive) person accidentally walked
into Springfield nuclear plant and got irradiated
which may or may not have given her a disease D. - The medication M will cure her of D if she has
it otherwise, it will kill her - There is a test T which when done on patients
with disese D, turns their tongues red R - You can observe with Look sensors to see if the
tongue is pink or not - We want to cure the patient without killing her..
(D,A) (D,A)
Medicate
(D,A) (D,A)
test
Sensing partitions belief state
(D,A,R) (D,A,R)
Is Tongue Red?
y
n
(D,A,R)
(D,A,R)
Medicate
(D,A,R)
22Unknown State Space
- When you buy Roomba does it have the layout of
your home? - Fat chance! For 200, they arent going to
customize it to everyones place! - When map is not given, the robot needs to both
learn the map, and achieve the goal - Integrates search/planning and learning
- Exploration/Exploitation tradeoff
- Should you bother learning more of the map when
you already found a way of satisfying the goal? - (At the end of elementary school, should you go
ahead and exploit the 5 years of knowledge you
gained by taking up a job or explore a bit more
by doing high-school, college, grad-school,
post-doc?)
Most relevant sub-area Reinforcement learning
23Utility of eyes (sensors) is reflected in the
size of the effective search space!
In general, a subgraph rather than a tree
(loops may be needed consider closing a
faulty door )
Given a state space of size n (or 2v where v is
the state variables) the single-state problem
searches for a path in the graph of size n (2v)
the multiple-state problem searches for a path in
a graph of size 2n (22v) the contingency
problem searches for a sub-graph in a graph of
size 2n (22v)
2n is the EVIL that every CS students
nightmares should be made of
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Example Robotic Path-Planning
- States Free space regions
- Operators Movement to neighboring regions
- Goal test Reaching the goal region
- Path cost Number of movements (distance
traveled)
I
hD
G
30(No Transcript)
311/29
- January 29, 2009
- Mars Rover Disoriented Somewhat After Glitch
- By KENNETH CHANG
- On the 1,800th Martian day of its mission, NASAs
Spirit rover blanked out, and it remains a bit
disoriented. - Mission managers at NASAs Jet Propulsion
Laboratory in Pasadena, Calif., reported
Wednesday that the Spirit had behaved oddly on
Sunday the 1,800th Sol, or Martian day, since
Spirits landing on Mars in January 2004. - (A Martian Sol is 39.5 minutes longer than an
Earth day. The Spirit and its twin, the
Opportunity, were designed to last just 90 Sols
each, but both continue to operate more than five
years later.) - On that day, the Spirit acknowledged receiving
its driving directions from Earth, but it did not
move. - More strangely, the Spirit had no memory of what
it had done for that part of Sol 1800. The rover
did not record actions, as it otherwise always
does, to the part of its computer memory that
retains information even when power is turned
off, the so-called nonvolatile memory. Its
almost as if the rover had a bout of amnesia,
said John Callas, the project manager for the
rovers. - Another rover system did record that power was
being drawn from the batteries for an hour and a
half. Meaning the rover is awake doing
something, Dr. Callas said. But before-and-after
images showed that the Spirit ended the day
exactly where it started. - On Monday, mission controllers told the Spirit to
orient itself by locating the Sun in the sky with
its camera, and it reported that it had been
unable to do so. Dr. Callas said the camera did
actually photograph the Sun, but it was not quite
in the position the rover expected. - One hypothesis is that a cosmic ray hit the
electronics and scrambled the rovers memory. On
Tuesday, the rovers nonvolatile memory worked
properly. - The Spirit now reports to be in good health and
responds to commands from Earth.
32(No Transcript)
33General Search
34(No Transcript)
35Search algorithms differ based on the specific
queuing function they use All search algorithms
must do goal-test only when the node is picked
up for expansion We typically analyze properties
of search algorithms on uniform trees
--with uniform branching factor b and goal depth
d (tree itself may go to depth dt )
36Evaluating
For the tree below, b3 d4
37(No Transcript)
38(No Transcript)
39(No Transcript)
40Breadth first search on a uniform tree of b10
Assume 1000nodes expanded/sec 100bytes/node
41(No Transcript)
42(No Transcript)
43Qn Is there a way of getting linear memory
search that is complete and optimal?
44The search is complete now (since there is
finite space to be explored). But still
inoptimal.
45(No Transcript)
46IDDFS Review
47(No Transcript)
48(No Transcript)
49A
B
C
D
G
50(No Transcript)
51A
B
C
D
G
Search on undirected graphs or directed graphs
with cycles Cycles galore
52Graph (instead of tree) Search Handling
repeated nodes
Main points --repeated expansions is a bigger
issue for DFS than for BFS or IDDFS --Trying
to remember all previously expanded nodes and
comparing the new nodes with them is infeasible
--Space becomes exponential
--duplicate checking can also be
exponential --Partial reduction in repeated
expansion can be done by --Checking to see
if any children of a node n have the same
state as the parent of n -- Checking to
see if any children of a node n have the same
state as any ancestor of n (at most d
ancestors for nwhere d is the depth of
n)