Autonomous Mobile Robots CPE 470/670

About This Presentation
Title:

Autonomous Mobile Robots CPE 470/670

Description:

Motivated by psychology (the Law of Effect, Thorndike 1991) ... Nate Kohl & Peter Stone (2004) CPE 470/670 - Lecture 13. 14. Learning to Push ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 46
Provided by: monicani
Learn more at: https://www.cse.unr.edu

less

Transcript and Presenter's Notes

Title: Autonomous Mobile Robots CPE 470/670


1
Autonomous Mobile RobotsCPE 470/670
  • Lecture 13
  • Instructor Monica Nicolescu

2
Review
  • Hybrid control
  • Selection, Advising, Adaptation, Postponing
  • AuRA, Atlantis, Planner-Reactor, PRS, many others
  • Adaptive behavior
  • Adaptation vs. learning
  • Challenges
  • Types of learning algorithms

3
Learning Methods
  • Reinforcement learning
  • Neural network (connectionist) learning
  • Evolutionary learning
  • Learning from experience
  • Memory-based
  • Case-based
  • Learning from demonstration
  • Inductive learning
  • Explanation-based learning
  • Multistrategy learning

4
Reinforcement Learning (RL)
  • Motivated by psychology (the Law of Effect,
    Thorndike 1991)
  • Applying a reward immediately after the
    occurrence of a response increases its
    probability of reoccurring, while providing
    punishment after the response will decrease the
    probability
  • One of the most widely used methods for
    adaptation in robotics

5
Reinforcement Learning
  • Combinations of stimuli
  • (i.e., sensory readings and/or state)
  • and responses (i.e., actions/behaviors)
  • are given positive/negative reward
  • in order to increase/decrease their probability
    of future use
  • Desirable outcomes are strengthened and
    undesirable outcomes are weakened
  • Critic evaluates the systems response and
    applies reinforcement
  • external the user provides the reinforcement
  • internal the system itself provides the
    reinforcement (reward function)

6
Decision Policy
  • The robot can observe the state of
  • the environment
  • The robot has a set of actions it can perform
  • Policy state/action mapping that determines
    which actions to take
  • Reinforcement is applied based on the results of
    the actions taken
  • Utility the function that gives a utility value
    to each state
  • Goal learn an optimal policy that chooses the
    best action for every set of possible inputs

7
Unsupervised Learning
  • RL is an unsupervised learning method
  • No target goal state
  • Feedback only provides information on the quality
    of the systems response
  • Simple binary fail/pass
  • Complex numerical evaluation
  • Through RL a robot learns on its own, using its
    own experiences and the feedback received
  • The robot is never told what to do

8
Challenges of RL
  • Credit assignment problem
  • When something good or bad happens, what exact
    state/condition-action/behavior should be
    rewarded or punished?
  • Learning from delayed rewards
  • It may take a long sequence of actions that
    receive insignificant reinforcement to finally
    arrive at a state with high reinforcement
  • How can the robot learn from reward received at
    some time in the future?

9
Challenges of RL
  • Exploration vs. exploitation
  • Explore unknown states/actions or exploit
    states/actions already known to yield high
    rewards
  • Partially observable states
  • In practice, sensors provide only partial
    information about the state
  • Choose actions that improve observability of
    environment
  • Life-long learning
  • In many situations it may be required that robots
    learn several tasks within the same environment

10
Types of RL Algorithms
  • Adaptive Heuristic Critic (AHC)
  • Learning the policy is separate from
  • learning the utility function the critic
  • uses for evaluation
  • Idea try different actions in
  • different states and observe
  • the outcomes over time

11
Q-Learning
  • Watkins 1980s
  • A single utility Q-function is learned
  • to evaluate both actions and states
  • Q values are stored in a table
  • Updated at each step, using the following rule
  • Q(x,a) ?Q(x,a) ? (r ?E(y) - Q(x,a))
  • x state a action ? learning rate r
    reward
  • ? discount factor (0,1)
  • E(y) is the utility of the state y E(y)
    max(Q(y,a)) ? actions a
  • Guaranteed to converge to optimal solution, given
    infinite trials

12
Learning to Walk
  • Maes, Brooks (1990)
  • Genghis hexapod robot
  • Learned stable tripod
  • stance and tripod gait
  • Rule-based subsumption
  • controller
  • Two sensor modalities for feedback
  • Two touch sensors to detect hitting the floor -
    feedback
  • Trailing wheel to measure progress feedback

13
Learning to Walk
  • Nate Kohl Peter Stone (2004)

14
Learning to Push
  • Mahadevan Connell 1991
  • Obelix 8 ultrasonic sensors, 1 IR, motor current
  • Learned how to push a box (Q-learning)
  • Motor outputs grouped into 5 choices move
    forward, turn left or right (22 degrees), sharp
    turn left/right (45 degrees)
  • 250,000 states

15
Supervised Learning
  • Supervised learning requires the user to give the
    exact solution to the robot in the form of the
    error direction and magnitude
  • The user must know the exact desired behavior for
    each situation
  • Supervised learning involves training, which can
    be very slow the user must supervise the system
    with numerous examples

16
Neural Networks
  • One of the most used supervised learning methods
  • Used for approximating real-valued and
    vector-valued target functions
  • Inspired from biology learning systems are built
    from complex networks of interconnecting neurons
  • The goal is to minimize the error between the
    network output and the desired output
  • This is achieved by adjusting the weights on the
    network connections

17
Training Neural Networks
  • Hebbian learning
  • Increases synaptic strength along neural pathways
    associated with a stimulus and a correct response
  • Perceptron learning
  • Delta Rule for networks without hidden layers
  • Back-propagation for multi-layer networks

18
Perceptron Learning
  • Repeat
  • Present an example from a set of positive and
    negative learning experiences
  • Verify the output of the network as to whether it
    is correct or incorrect
  • If it is incorrect, supply the correct output at
    the output unit
  • Adjust the synaptic weights of the perceptrons in
    a manner that reduces the error between the
    observed output and the correct output
  • Until satisfactory performance (convergence or
    stopping condition is met)

19
ALVINN
  • ALVINN (Autonomous Land Vehicle in a Neural
    Network)
  • Dean Pomerleau (1991)
  • Pittsburg to San Diego 98.2 autonomous

20
Learning from Demonstration RL
  • S. Schaal (97)
  • Pole balancing, pendulum-swing-up

21
Learning from Demonstration
  • Inspiration
  • Human-like teaching by demonstration

Demonstration
Robot performance
22
Learning to Slalom
Demonstration
Robot performance
23
Learning from Robot Teachers
  • Transfer of task knowledge from humans to robots

Human demonstration
Robot performance
24
Classical Conditioning
  • Pavlov 1927
  • Assumes that unconditioned stimuli (e.g. food)
    automatically generate an unconditioned response
    (e.g., salivation)
  • Conditioned stimulus (e.g., ringing a bell) can,
    over time, become associated with the
    unconditioned response

25
Darvin VII
  • G. Edelman et. Al.
  • Darvin VII Sensors
  • CCD Camera
  • Gripper that senses conductivity
  • IR sensors
  • Darvin VII Actuators
  • PTZ camera
  • Wheels
  • Gripper
  • Low reflectivity walls, floor
  • Two types of stimulus blocks
  • 6cm metallic cubes
  • Blobs low conductivity (bad taste)
  • Stripes high conductivity (good taste)

26
Darvins Perceptual Categorization
Early training
After the 10th stimulus
  • Instead of hard-wiring stimulus-response rules,
    develop these associations over time

27
Genetic Algorithms
  • Inspired from evolutionary biology
  • Individuals in a populations have a particular
    fitness with respect to a task
  • Individuals with the highest fitness are kept as
    survivors
  • Individuals with poor performance are discarded
    the process of natural selection
  • Evolutionary process search through the space of
    solutions to find the one with the highest
    fitness

28
Genetic Operators
  • Knowledge is encoded as bit strings chromozome
  • Each bit represents a gene
  • Biologically inspired operators are applied to
    yield better generations

29
Classifier Systems
  • ALECSYS system
  • Learns new behaviors and coordination
  • Genetic operators act upon a set of rules encoded
    by bit strings
  • Demonstrated tasks
  • Phototaxis
  • Coordination of approaching, chasing and escaping
    behaviors by combination, suppression and
    sequencing

30
Evolving Structure and Control
  • Karl Sims 1994
  • Evolved morphology and control
  • for virtual creatures performing
  • swimming, walking, jumping,
  • and following
  • Genotypes encoded as directed graphs are used to
    produce 3D kinematic structures
  • Genotype encode points of attachment
  • Sensors used contact, joint angle and
    photosensors

31
Evolving Structure and Control
  • Jordan Pollak
  • Real structures

32
Fuzzy Control
  • Fuzzy control produces actions using a set of
    fuzzy rules based on fuzzy logic
  • In fuzzy logic, variables take values based on
    how much they belong to a particular fuzzy set
  • Fast, slow, far, near not crisp values!!
  • A fuzzy logic control system consists of
  • Fuzzifier maps sensor readings to fuzzy input
    sets
  • Fuzzy rule base collection of IF-THEN rules
  • Fuzzy inference maps fuzzy sets to other fuzzy
    sets according to the rulebase
  • Defuzzifier maps fuzzy outputs to crisp actuator
    commands

33
Examples of Fuzzy Control
  • Flakey the robot
  • Behaviors are encoded as collections of fuzzy
    rules
  • IF obstacle-close-in-front AND NOT
    obstacle-close-on-left
  • THEN turn sharp-left
  • Each behavior may be active to a varying degree
  • Behavior responses are blended smoothly
  • Multiple goals can be pursued
  • Systems for learning fuzzy rules have also been
    developed

34
Where Next?
35
Fringe Robotics Beyond Behavior
  • Questions for the future
  • Human-like intelligence
  • Robot consciousness
  • Complete autonomy of complex thought and action
  • Emotions and imagination in artificial systems
  • Nanorobotics
  • Successor to human beings

36
A Robot Mind
  • The goal of AI is to build artificial minds
  • What is the mind?
  • The mind is what the brain does. (M. Minsky)
  • The mind includes
  • thinking
  • feeling

37
Computational Thought
  • What does it mean for a machine to think?
  • Bellman
  • Thought is not well defined, so we cannot
    ascribe/judge it
  • Computers can perform processes representative of
    human thought decision making/learning
  • Albus
  • For robots to understand humans, they must be
    indistinguishable from humans in bodily
    appearance, physical and mental development
  • Brooks
  • Thought and consciousness need not be programmed
    in they will emerge

38
The Turing Test
  • Developed by the mathematician Alan Turing
  • Original version of Turing Test
  • Two people (a man and a woman) are put in
    separate closed rooms. A third person can
    interact with each of the two through writing (no
    voices).
  • Can the 3rd person tell the difference between
    the man and the woman?

39
The Turing Test
  • AI version of the Turing Test
  • A person sits in front of two terminals at one
    end is a human at the other end is a computer.
    The questioner is free to ask any questions to
    the respondents at the other end of the terminals
  • If the questioner cannot tell the difference
    between the computer and the human subject, the
    computer has passed the Turing Test!

40
The Turing Test
  • The Turing Test contest is performed annually,
    and it carries a 100,000 award for anybody who
    passes it
  • No computer so far has truly passed the Turing
    Test
  • Is this a good test of intelligence?
  • Thought is defined based on human fallibility
    rather than on machine consciousness
  • Many researchers oppose to using this test as a
    proof of intelligence

41
Penroses Critique
  • Roger Penrose (Emperors new Mind, Shadows of the
    Mind), a British physicist, is a famous critic of
    AI
  • Intelligence is a consequence of neural activity
    and interactions in the brain
  • Computers can only simulate this activity, but
    this is not sufficient for true intelligence
  • Intelligence requires understanding, and
    understanding requires awareness, an aspect of
    consciousness
  • Many refuting arguments have been given

42
They're Made Out Of Meat
Terry Bisson
  • "They're made out of meat.
  • "Meat?
  • "Meat. They're made out of meat.
  • "Meat?
  • "There's no doubt about it. We picked several
    from different parts of the planet, took them
    aboard our recon vessels, probed them all the way
    through. They're completely meat.
  • "That's impossible. What about the radio
    signals? The messages to the stars.
  • "They use the radio waves to talk, but the
    signals don't come from them. The signals come
    from machines.
  • "So who made the machines? That's who we want to
    contact."

43
They're Made Out Of Meat
Terry Bisson
  • "They made the machines. That's what I'm trying
    to tell you. Meat made the machines.
  • That's ridiculous. How can meat make a machine?
    You're asking me to believe in sentient meat.
  • "I'm not asking you, I'm telling you. These
    creatures are the only sentient race in the
    sector and they're made out of meat.
  • "Maybe they're like the Orfolei. You know, a
    carbon-based intelligence that goes through a
    meat stage.
  • "Nope. They're born meat and they die meat. We
    studied them for several of their life spans,
    which didn't take too long. Do you have any idea
    whats the life span of meat?
  • "Spare me. Okay, maybe they're only part meat.
    You know, like the Weddilei. A meat head with an
    electron plasma brain inside."

44
They're Made Out Of Meat
Terry Bisson
  • "Nope. We thought of that, since they do have
    meat heads like the Weddilei. But I told you, we
    probed them. They're meat all the way through.
  • "No brain?
  • "Oh, there is a brain all right. It's just that
    the brain is made out of meat!
  • "So... what does the thinking?"
  • "You're not understanding, are you? The brain
    does the thinking. The meat.
  • "Thinking meat! You're asking me to believe in
    thinking meat!
  • "Yes, thinking meat! Conscious meat! Loving
    meat. Dreaming meat. The meat is the whole deal!
    Are you getting the picture?"

45
Conclusion
  • Lots of remaining interesting problems to
    explore!
  • Get involved!

46
Readings
  • Lecture notes
Write a Comment
User Comments (0)