Title: Autonomous Mobile Robots CPE 470/670
1Autonomous Mobile RobotsCPE 470/670
- Lecture 13
- Instructor Monica Nicolescu
2Review
- Hybrid control
- Selection, Advising, Adaptation, Postponing
- AuRA, Atlantis, Planner-Reactor, PRS, many others
- Adaptive behavior
- Adaptation vs. learning
- Challenges
- Types of learning algorithms
3Learning Methods
- Reinforcement learning
- Neural network (connectionist) learning
- Evolutionary learning
- Learning from experience
- Memory-based
- Case-based
- Learning from demonstration
- Inductive learning
- Explanation-based learning
- Multistrategy learning
4Reinforcement Learning (RL)
- Motivated by psychology (the Law of Effect,
Thorndike 1991) - Applying a reward immediately after the
occurrence of a response increases its
probability of reoccurring, while providing
punishment after the response will decrease the
probability - One of the most widely used methods for
adaptation in robotics
5Reinforcement Learning
- Combinations of stimuli
- (i.e., sensory readings and/or state)
- and responses (i.e., actions/behaviors)
- are given positive/negative reward
- in order to increase/decrease their probability
of future use - Desirable outcomes are strengthened and
undesirable outcomes are weakened - Critic evaluates the systems response and
applies reinforcement - external the user provides the reinforcement
- internal the system itself provides the
reinforcement (reward function)
6Decision Policy
- The robot can observe the state of
- the environment
- The robot has a set of actions it can perform
- Policy state/action mapping that determines
which actions to take - Reinforcement is applied based on the results of
the actions taken - Utility the function that gives a utility value
to each state - Goal learn an optimal policy that chooses the
best action for every set of possible inputs
7Unsupervised Learning
- RL is an unsupervised learning method
- No target goal state
- Feedback only provides information on the quality
of the systems response - Simple binary fail/pass
- Complex numerical evaluation
- Through RL a robot learns on its own, using its
own experiences and the feedback received - The robot is never told what to do
8Challenges of RL
- Credit assignment problem
- When something good or bad happens, what exact
state/condition-action/behavior should be
rewarded or punished? - Learning from delayed rewards
- It may take a long sequence of actions that
receive insignificant reinforcement to finally
arrive at a state with high reinforcement - How can the robot learn from reward received at
some time in the future?
9Challenges of RL
- Exploration vs. exploitation
- Explore unknown states/actions or exploit
states/actions already known to yield high
rewards - Partially observable states
- In practice, sensors provide only partial
information about the state - Choose actions that improve observability of
environment - Life-long learning
- In many situations it may be required that robots
learn several tasks within the same environment
10Types of RL Algorithms
- Adaptive Heuristic Critic (AHC)
- Learning the policy is separate from
- learning the utility function the critic
- uses for evaluation
- Idea try different actions in
- different states and observe
- the outcomes over time
11Q-Learning
- Watkins 1980s
- A single utility Q-function is learned
- to evaluate both actions and states
- Q values are stored in a table
- Updated at each step, using the following rule
- Q(x,a) ?Q(x,a) ? (r ?E(y) - Q(x,a))
- x state a action ? learning rate r
reward - ? discount factor (0,1)
- E(y) is the utility of the state y E(y)
max(Q(y,a)) ? actions a - Guaranteed to converge to optimal solution, given
infinite trials
12Learning to Walk
- Maes, Brooks (1990)
- Genghis hexapod robot
- Learned stable tripod
- stance and tripod gait
- Rule-based subsumption
- controller
- Two sensor modalities for feedback
- Two touch sensors to detect hitting the floor -
feedback - Trailing wheel to measure progress feedback
13Learning to Walk
- Nate Kohl Peter Stone (2004)
14Learning to Push
- Mahadevan Connell 1991
- Obelix 8 ultrasonic sensors, 1 IR, motor current
- Learned how to push a box (Q-learning)
- Motor outputs grouped into 5 choices move
forward, turn left or right (22 degrees), sharp
turn left/right (45 degrees) - 250,000 states
15Supervised Learning
- Supervised learning requires the user to give the
exact solution to the robot in the form of the
error direction and magnitude - The user must know the exact desired behavior for
each situation - Supervised learning involves training, which can
be very slow the user must supervise the system
with numerous examples
16Neural Networks
- One of the most used supervised learning methods
- Used for approximating real-valued and
vector-valued target functions - Inspired from biology learning systems are built
from complex networks of interconnecting neurons - The goal is to minimize the error between the
network output and the desired output - This is achieved by adjusting the weights on the
network connections
17Training Neural Networks
- Hebbian learning
- Increases synaptic strength along neural pathways
associated with a stimulus and a correct response - Perceptron learning
- Delta Rule for networks without hidden layers
- Back-propagation for multi-layer networks
18Perceptron Learning
- Repeat
- Present an example from a set of positive and
negative learning experiences - Verify the output of the network as to whether it
is correct or incorrect - If it is incorrect, supply the correct output at
the output unit - Adjust the synaptic weights of the perceptrons in
a manner that reduces the error between the
observed output and the correct output - Until satisfactory performance (convergence or
stopping condition is met)
19ALVINN
- ALVINN (Autonomous Land Vehicle in a Neural
Network) - Dean Pomerleau (1991)
- Pittsburg to San Diego 98.2 autonomous
20Learning from Demonstration RL
- S. Schaal (97)
- Pole balancing, pendulum-swing-up
21Learning from Demonstration
- Inspiration
- Human-like teaching by demonstration
Demonstration
Robot performance
22Learning to Slalom
Demonstration
Robot performance
23Learning from Robot Teachers
- Transfer of task knowledge from humans to robots
Human demonstration
Robot performance
24Classical Conditioning
- Pavlov 1927
- Assumes that unconditioned stimuli (e.g. food)
automatically generate an unconditioned response
(e.g., salivation) - Conditioned stimulus (e.g., ringing a bell) can,
over time, become associated with the
unconditioned response
25Darvin VII
- G. Edelman et. Al.
- Darvin VII Sensors
- CCD Camera
- Gripper that senses conductivity
- IR sensors
- Darvin VII Actuators
- PTZ camera
- Wheels
- Gripper
- Low reflectivity walls, floor
- Two types of stimulus blocks
- 6cm metallic cubes
- Blobs low conductivity (bad taste)
- Stripes high conductivity (good taste)
26Darvins Perceptual Categorization
Early training
After the 10th stimulus
- Instead of hard-wiring stimulus-response rules,
develop these associations over time
27Genetic Algorithms
- Inspired from evolutionary biology
- Individuals in a populations have a particular
fitness with respect to a task - Individuals with the highest fitness are kept as
survivors - Individuals with poor performance are discarded
the process of natural selection - Evolutionary process search through the space of
solutions to find the one with the highest
fitness
28Genetic Operators
- Knowledge is encoded as bit strings chromozome
- Each bit represents a gene
- Biologically inspired operators are applied to
yield better generations
29Classifier Systems
- ALECSYS system
- Learns new behaviors and coordination
- Genetic operators act upon a set of rules encoded
by bit strings - Demonstrated tasks
- Phototaxis
- Coordination of approaching, chasing and escaping
behaviors by combination, suppression and
sequencing
30Evolving Structure and Control
- Karl Sims 1994
- Evolved morphology and control
- for virtual creatures performing
- swimming, walking, jumping,
- and following
- Genotypes encoded as directed graphs are used to
produce 3D kinematic structures - Genotype encode points of attachment
- Sensors used contact, joint angle and
photosensors
31Evolving Structure and Control
- Jordan Pollak
- Real structures
32Fuzzy Control
- Fuzzy control produces actions using a set of
fuzzy rules based on fuzzy logic - In fuzzy logic, variables take values based on
how much they belong to a particular fuzzy set - Fast, slow, far, near not crisp values!!
- A fuzzy logic control system consists of
- Fuzzifier maps sensor readings to fuzzy input
sets - Fuzzy rule base collection of IF-THEN rules
- Fuzzy inference maps fuzzy sets to other fuzzy
sets according to the rulebase - Defuzzifier maps fuzzy outputs to crisp actuator
commands
33Examples of Fuzzy Control
- Flakey the robot
- Behaviors are encoded as collections of fuzzy
rules - IF obstacle-close-in-front AND NOT
obstacle-close-on-left - THEN turn sharp-left
- Each behavior may be active to a varying degree
- Behavior responses are blended smoothly
- Multiple goals can be pursued
- Systems for learning fuzzy rules have also been
developed
34Where Next?
35Fringe Robotics Beyond Behavior
- Questions for the future
- Human-like intelligence
- Robot consciousness
- Complete autonomy of complex thought and action
- Emotions and imagination in artificial systems
- Nanorobotics
- Successor to human beings
36A Robot Mind
- The goal of AI is to build artificial minds
- What is the mind?
- The mind is what the brain does. (M. Minsky)
- The mind includes
- thinking
- feeling
37Computational Thought
- What does it mean for a machine to think?
- Bellman
- Thought is not well defined, so we cannot
ascribe/judge it - Computers can perform processes representative of
human thought decision making/learning - Albus
- For robots to understand humans, they must be
indistinguishable from humans in bodily
appearance, physical and mental development - Brooks
- Thought and consciousness need not be programmed
in they will emerge
38The Turing Test
- Developed by the mathematician Alan Turing
- Original version of Turing Test
- Two people (a man and a woman) are put in
separate closed rooms. A third person can
interact with each of the two through writing (no
voices). - Can the 3rd person tell the difference between
the man and the woman?
39The Turing Test
- AI version of the Turing Test
- A person sits in front of two terminals at one
end is a human at the other end is a computer.
The questioner is free to ask any questions to
the respondents at the other end of the terminals - If the questioner cannot tell the difference
between the computer and the human subject, the
computer has passed the Turing Test!
40The Turing Test
- The Turing Test contest is performed annually,
and it carries a 100,000 award for anybody who
passes it - No computer so far has truly passed the Turing
Test - Is this a good test of intelligence?
- Thought is defined based on human fallibility
rather than on machine consciousness - Many researchers oppose to using this test as a
proof of intelligence
41Penroses Critique
- Roger Penrose (Emperors new Mind, Shadows of the
Mind), a British physicist, is a famous critic of
AI - Intelligence is a consequence of neural activity
and interactions in the brain - Computers can only simulate this activity, but
this is not sufficient for true intelligence - Intelligence requires understanding, and
understanding requires awareness, an aspect of
consciousness - Many refuting arguments have been given
42They're Made Out Of Meat
Terry Bisson
- "They're made out of meat.
- "Meat?
- "Meat. They're made out of meat.
- "Meat?
- "There's no doubt about it. We picked several
from different parts of the planet, took them
aboard our recon vessels, probed them all the way
through. They're completely meat. - "That's impossible. What about the radio
signals? The messages to the stars. - "They use the radio waves to talk, but the
signals don't come from them. The signals come
from machines. - "So who made the machines? That's who we want to
contact."
43They're Made Out Of Meat
Terry Bisson
- "They made the machines. That's what I'm trying
to tell you. Meat made the machines. - That's ridiculous. How can meat make a machine?
You're asking me to believe in sentient meat. - "I'm not asking you, I'm telling you. These
creatures are the only sentient race in the
sector and they're made out of meat. - "Maybe they're like the Orfolei. You know, a
carbon-based intelligence that goes through a
meat stage. - "Nope. They're born meat and they die meat. We
studied them for several of their life spans,
which didn't take too long. Do you have any idea
whats the life span of meat? - "Spare me. Okay, maybe they're only part meat.
You know, like the Weddilei. A meat head with an
electron plasma brain inside."
44They're Made Out Of Meat
Terry Bisson
- "Nope. We thought of that, since they do have
meat heads like the Weddilei. But I told you, we
probed them. They're meat all the way through. - "No brain?
- "Oh, there is a brain all right. It's just that
the brain is made out of meat! - "So... what does the thinking?"
- "You're not understanding, are you? The brain
does the thinking. The meat. - "Thinking meat! You're asking me to believe in
thinking meat! - "Yes, thinking meat! Conscious meat! Loving
meat. Dreaming meat. The meat is the whole deal!
Are you getting the picture?"
45Conclusion
-
- Lots of remaining interesting problems to
explore! - Get involved!
46Readings