Learning from Disagreeing Demonstrators - PowerPoint PPT Presentation

About This Presentation
Title:

Learning from Disagreeing Demonstrators

Description:

Learning from Disagreeing Demonstrators. Bruno N. da Silva. University of British Columbia ... Some traditional cases of Learning from Demonstration assume a ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 13
Provided by: Bru6161
Category:

less

Transcript and Presenter's Notes

Title: Learning from Disagreeing Demonstrators


1
Learning from Disagreeing Demonstrators
  • Bruno N. da Silva
  • University of British Columbia
  • bnds_at_cs.ubc.ca

2
Motivation
  • Some traditional cases of Learning from
    Demonstration assume a human expert
  • In some (subjective) tasks, there might not be a
    single expert
  • How to drive from point A to B

3
Motivation
  • In general, these tasks involve more than one
    feature
  • e.g. in the driving domain, want to optimize
    travel time and number of crashes
  • Different contexts lead to different tradeoffs
    between features
  • Idiosyncratic demonstrators do not reflect on
    their routine approach to the problem

4
Problem definition
  • How can we integrate idiosyncratic (disagreeing)
    demonstrations to form a homogeneous and
    effective policy?

5
Solution
  • We extend the framework presented by Argall et
    al, 2007
  • Traditional demonstrations in the first stage
  • Robot execution and human critique in the second
    stage
  • Robot collects critiques
  • Robot updates policy

6
The 1st stage of the mechanism
7
The 2nd stage of the mechanism
8
A little more concretely
  • The first stage can be interpreted as a set of
    datapoints (pm,an,c)
  • Perception pm
  • Action an
  • Confidence on the mapping c
  • The criticism will affect the confidence
  • If praise the execution, increase c
  • If knock the execution, decrease c

9
But lets not be naïve
  • If demonstrators lie in the demonstration, they
    would lie in the criticism
  • Therefore, associate a reputation ri with each
    demonstration di
  • And update the confidence level carefully
  • c c ri f(feedback)

10
Adjusting reputation ranks
  • And adjust ri based on (lack of) improvement from
    dis feedback
  • ri ri ? evaluation(feedback)
  • evaluation(.) can be interpreted as a Pareto
    improvement from the feedback

11
Current investigations
  • Policy conversion?
  • Rate of conversion?
  • What are the long term effects on human
    demonstrators?
  • Frustration?
  • Repudiation?
  • Will critiques really be mindful?

12
Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com