Learning Language from its Perceptual Context - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Language from its Perceptual Context

Description:

Learning Language from its Perceptual Context Ray Mooney Department of Computer Sciences University of Texas at Austin Joint work with David Chen Joohyun Kim – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 70
Provided by: Raymond182
Category:

less

Transcript and Presenter's Notes

Title: Learning Language from its Perceptual Context


1
Learning Language from its Perceptual Context
  • Ray Mooney
  • Department of Computer Sciences
  • University of Texas at Austin

Joint work with David Chen Joohyun Kim Rohit Kate
2
Current State of Natural Language Learning
  • Most current state-of-the-art NLP systems are
    constructed by training on large supervised
    corpora.
  • Syntactic Parsing Penn Treebank
  • Word Sense Disambiguation SenseEval
  • Semantic Role Labeling Propbank
  • Machine Translation Hansards corpus
  • Constructing such annotated corpora is difficult,
    expensive, and time consuming.

3
Semantic Parsing
  • A semantic parser maps a natural-language (NL)
    sentence to a complete, detailed formal semantic
    representation logical form or meaning
    representation (MR).
  • For many applications, the desired output is
    computer language that is immediately executable
    by another program.

4
CLang RoboCup Coach Language
  • In RoboCup Coach competition teams compete to
    coach simulated soccer players
  • The coaching instructions are given in a formal
    language called CLang

Simulated soccer field
5
Learning Semantic Parsers
  • Manually programming robust semantic parsers is
    difficult due to the complexity of the task.
  • Semantic parsers can be learned automatically
    from sentences paired with their logical form.

NL?MR Training Exs
Meaning Rep
6
Our Semantic-Parser Learners
  • CHILLWOLFIE (Zelle Mooney, 1996 Thompson
    Mooney, 1999)
  • Separates parser-learning and semantic-lexicon
    learning.
  • Learns a deterministic parser using ILP
    techniques.
  • COCKTAIL (Tang Mooney, 2001)
  • Improved ILP algorithm for CHILL.
  • SILT (Kate, Wong Mooney, 2005)
  • Learns symbolic transformation rules for mapping
    directly from NL to MR.
  • SCISSOR (Ge Mooney, 2005)
  • Integrates semantic interpretation into Collins
    statistical syntactic parser.
  • WASP (Wong Mooney, 2006 2007)
  • Uses syntax-based statistical machine translation
    methods.
  • KRISP (Kate Mooney, 2006)
  • Uses a series of SVM classifiers employing a
    string-kernel to iteratively build semantic
    representations.
  • SynSem (Ge Mooney, 2009)
  • Uses existing statistical syntactic parser word
    alignment.

?
?
7
Learning Language from Perceptual Context
  • Children do not learn language from annotated
    corpora.
  • Neither do they learn language from just reading
    the newspaper, surfing the web, or listening to
    the radio.
  • Unsupervised language learning
  • DARPA Learning by Reading Program
  • The natural way to learn language is to perceive
    language in the context of its use in the
    physical and social world.
  • This requires inferring the meaning of utterances
    from their perceptual context.

7
8
Language Grounding
  • The meanings of many words are grounded in our
    perception of the physical world red, ball, cup,
    run, hit, fall, etc.
  • Symbol Grounding Harnad (1990)
  • Even many abstract words and meanings are
    metaphorical abstractions of terms grounded in
    the physical world up, down, over, in, etc.
  • Lakoff and Johnsons Metaphors We Live By
  • Its difficult to put my ideas into words.
  • Most NLP work represents meaning without any
    connection to perception circularly defining the
    meanings of words in terms of other words or
    meaningless symbols with no firm foundation.

8
9
Sample Circular Definitionsfrom WordNet
  • sleep (v)
  • be asleep
  • asleep (adj)
  • in a state of sleep

10
Mary is on the phone
11
Mary is on the phone
12
Mary is on the phone
13
Ironing(Mommy, Shirt)
Mary is on the phone
14
Ironing(Mommy, Shirt)
Working(Sister, Computer)
Mary is on the phone
15
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
Mary is on the phone
16
Ambiguous Training Example
Ironing(Mommy, Shirt)
Carrying(Daddy, Bag)
Working(Sister, Computer)
Talking(Mary, Phone)
Sitting(Mary, Chair)
Mary is on the phone
17
Next Ambiguous Training Example
Ironing(Mommy, Shirt)
Working(Sister, Computer)
Talking(Mary, Phone)
???
Sitting(Mary, Chair)
Mommy is ironing a shirt
18
Ambiguous Supervision for Learning Semantic
Parsers
  • Our model of ambiguous supervision corresponds to
    the type of data that will be gathered from a
    temporal sequence of perceptual contexts with
    occasional language commentary.
  • We assume each sentence has exactly one meaning
    in its perceptual context.
  • Recently extended to handle sentences with no
    meaning in its perceptual context.
  • Each meaning is associated with at most one
    sentence.

19
Sample Ambiguous Corpus
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
Forms a bipartite graph
20
KRISPER (Kate Mooney, 2007) KRISP with
EM-like Retraining
  • Extension of KRISP that learns from ambiguous
    supervision.
  • Uses an iterative EM-like self-training method to
    gradually converge on a correct meaning for each
    sentence.

21
KRISPERs Training Algorithm
1. Assume every possible meaning for a sentence
is correct
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
22
KRISPERs Training Algorithm
1. Assume every possible meaning for a sentence
is correct
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
23
KRISPERs Training Algorithm
2. Resulting NL-MR pairs are weighted and given
to KRISP
gave(daisy, clock, mouse)
1/2
ate(mouse, orange)
Daisy gave the clock to the mouse.
1/2
ate(dog, apple)
1/4
1/4
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
1/4
1/4
broke(dog, box)
1/5
1/5
1/5
The dog broke the box.
gave(woman, toy, mouse)
1/5
1/5
gave(john, bag, mouse)
1/3
1/3
John gave the bag to the mouse.
threw(dog, ball)
1/3
1/3
runs(dog)
1/3
The dog threw the ball.
1/3
saw(john, walks(man, dog))
24
KRISPERs Training Algorithm
3. Estimate the confidence of each NL-MR pair
using the resulting trained parser
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
25
KRISPERs Training Algorithm
4. Use maximum weighted matching on a bipartite
graph to find the best NL-MR pairs Munkres,
1957
gave(daisy, clock, mouse)
0.92
ate(mouse, orange)
Daisy gave the clock to the mouse.
0.11
ate(dog, apple)
0.32
0.88
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
0.22
0.24

broke(dog, box)
0.18
0.71
0.85
The dog broke the box.
gave(woman, toy, mouse)
0.14
0.95
gave(john, bag, mouse)
0.24
0.89
John gave the bag to the mouse.
threw(dog, ball)
0.33
0.97
runs(dog)
0.81
The dog threw the ball.
0.34
saw(john, walks(man, dog))
26
KRISPERs Training Algorithm
4. Use maximum weighted matching on a bipartite
graph to find the best NL-MR pairs Munkres,
1957
gave(daisy, clock, mouse)
0.92
ate(mouse, orange)
Daisy gave the clock to the mouse.
0.11
ate(dog, apple)
0.32
0.88
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
0.22
0.24
broke(dog, box)
0.18
0.71
0.85
The dog broke the box.
gave(woman, toy, mouse)
0.14
0.95
gave(john, bag, mouse)
0.24
0.89
John gave the bag to the mouse.
threw(dog, ball)
0.33
0.97
runs(dog)
0.81
The dog threw the ball.
0.34
saw(john, walks(man, dog))
27
KRISPERs Training Algorithm
5. Give the best pairs to KRISP in the next
iteration, and repeat until convergence
gave(daisy, clock, mouse)
ate(mouse, orange)
Daisy gave the clock to the mouse.
ate(dog, apple)
Mommy saw that Mary gave the hammer to the dog.
saw(mother, gave(mary, dog, hammer))
broke(dog, box)
The dog broke the box.
gave(woman, toy, mouse)
gave(john, bag, mouse)
John gave the bag to the mouse.
threw(dog, ball)
runs(dog)
The dog threw the ball.
saw(john, walks(man, dog))
28
New ChallengeLearning to Be a Sportscaster
  • Goal Learn from realistic data of natural
    language used in a representative context while
    avoiding difficult issues in computer perception
    (i.e. speech and vision).
  • Solution Learn from textually annotated traces
    of activity in a simulated environment.
  • Example Traces of games in the Robocup simulator
    paired with textual sportscaster commentary.

29
Tactical Generation
  • Learn how to generate NL from MR
  • Example

Pass(Pink2, Pink3) ? Pink2 kicks the ball to
Pink3
30
WASP / WASP-1(Wong Mooney, 2006, 2007)
  • Supervised system for learning both a semantic
    parser and a tactical language generator.
  • Uses probabilistic version of a synchronous
    context-free grammar (SCFG) that generates two
    corresponding strings (NL MR) simultaneously.

31
Grounded Language Learning in Robocup
Robocup Simulator
Sportscaster
Score!!!!
Score!!!!
32
Sample Human Sportscast in Korean
33
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
34
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
35
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
kick ( Pink11 )
ballstopped
kick ( Pink11 )
Pink11 makes a long pass to Pink8
pass ( Pink11, Pink8 )
kick ( Pink8 )
pass ( Pink8, Pink11 )
Pink8 passes back to Pink11
36
Robocup Sportscaster Trace
Natural Language Commentary
Meaning Representation
P6 ( C1, C19 )
P5 ( C1, C19 )
Purple goalie turns the ball over to Pink8
P1( C19 )
P2 ( C19, C22 )
Purple team is very sloppy today
P1 ( C22 )
Pink8 passes the ball to Pink11
Pink11 looks around for a teammate
P1 ( C22 )
P0
P1 ( C22 )
Pink11 makes a long pass to Pink8
P2 ( C22, C19 )
P1 ( C19 )
P2 ( C19, C22 )
Pink8 passes back to Pink11
37
Strategic Generation
  • Generation requires not only knowing how to say
    something (tactical generation) but also what to
    say (strategic generation).
  • For automated sportscasting, one must be able to
    effectively choose which events to describe.

38
Example of Strategic Generation
pass ( purple7 , purple6 ) ballstopped kick (
purple6 ) pass ( purple6 , purple2 )
ballstopped kick ( purple2 ) pass ( purple2 ,
purple3 ) kick ( purple3 ) badPass ( purple3 ,
pink9 ) turnover ( purple3 , pink9 )
39
Example of Strategic Generation
pass ( purple7 , purple6 ) ballstopped kick (
purple6) pass ( purple6 , purple2 ) ballstopped
kick ( purple2) pass ( purple2 , purple3 )
kick ( purple3 ) badPass ( purple3 , pink9 )
turnover ( purple3 , pink9 )
40
Robocup Data
  • Collected human textual commentary for the 4
    Robocup championship games from 2001-2004.
  • Avg events/game 2,613
  • Avg English sentences/game 509
  • Avg Korean sentences/game 499
  • Each sentence matched to all events within
    previous 5 seconds.
  • Avg MRs/sentence 2.5 (min 1, max 12)
  • Manually annotated with correct matchings of
    sentences to MRs (for evaluation purposes only).

41
WASPER
  • WASP with EM-like retraining to handle ambiguous
    training data.
  • Same augmentation as added to KRISP to create
    KRISPER.

42
KRISPER-WASP
  • First train KRISPER to disambiguate the data
  • Then train WASP on the resulting unambiguously
    supervised data.

43
WASPER-GEN
  • Determines the best matching based on generation
    (MR?NL).
  • Score each potential NL/MR pair by using the
    currently trained WASP-1 generator.
  • Compute NIST MT score NIST report, 2002 between
    the generated sentence and the potential matching
    sentence.

44
Strategic Generation Learning
  • For each event type (e.g. pass, kick) estimate
    the probability that it is described by the
    sportscaster.
  • Requires correct NL/MR matching
  • Use estimated matching from tactical generation
  • Iterative Generation Strategy Learning

45
Iterative Generation Strategy Learning (IGSL)
  • Estimates the likelihood of commenting on each
    event-type directly from the ambiguous training
    data.
  • Uses EM-like self-training iterations to compute
    estimates.

46
English Demo
  • Game clip commentated using WASPER-GEN with IGSL
    strategic generation, since this gave the best
    results for generation.
  • FreeTTS was used to synthesize speech from
    textual output.

47
Machine Sportscast in English
48
Experimental Evaluation
  • Generated learning curves by training on all
    combinations of 1 to 3 games and testing on all
    games not used for training.
  • Baselines
  • Random Matching WASP trained on random choice of
    possible MR for each comment.
  • Gold Matching WASP trained on correct matching
    of MR for each comment.
  • Metrics
  • Precision of systems annotations that are
    correct
  • Recall of gold-standard annotations correctly
    produced
  • F-measure Harmonic mean of precision and recall

49
Evaluating NL-MR Matching
  • How well does the learner figure out which event
    (if any) each sentence refers to?

Natural Language Commentary
Meaning Representation
badPass ( Purple1, Pink8 )
turnover ( Purple1, Pink8 )
Purple goalie turns the ball over to Pink8
kick ( Pink8)
pass ( Pink8, Pink11 )
Purple team is very sloppy today
kick ( Pink11 )
Pink8 passes the ball to Pink11
50
Matching Results(F-Measure)
51
Evaluating Semantic Parsing
  • How well does the system learn to interpret the
    meaning of a novel sentence?
  • Compare result to correct MR from the gold
    standard matches.

Natural Language Commentary
Meaning Representation
turnover ( Purple1, Pink8 )
Purple goalie looses the ball to Pink8
52
Semantic Parsing Results(F-Measure)
53
Evaluating Tactical Generation
  • How accurately does the system generate natural
    language descriptions of events?
  • Use gold-standard matches to determine the
    correct sentence for each MR that has one.
  • Evaluation Metric
  • BLEU score Papineni et al, 2002, N4

Natural Language Commentary
Meaning Representation
turnover ( Purple1, Pink8 )
Purple goalie looses the ball to Pink8
54
Tactical Generation Results(BLEU Score)
55
Evaluating Strategic Generation
  • How well does the system predict which events the
    human sportscaster will mention?

pass ( purple7 , purple6 ) ballstopped kick (
purple6) pass ( purple6 , purple2 ) ballstopped
kick ( purple2) pass ( purple2 , purple3 )
kick ( purple3 ) badPass ( purple3 , pink9 )
turnover ( purple3 , pink9 )
56
Strategic Generation Results
57
Human EvaluationPseudo Turing Test
  • Used Amazons Mechanical Turk to recruit human
    judges (36 English, 7 Korean judges per video)
  • 8 commented game clips
  • 4 minute clips randomly selected from each of the
    4 games
  • Each clip commented once by a human, and once by
    the machine
  • Presented in random counter-balanced order
  • Judges were not told which ones were human or
    machine generated

58
Human Evaluation Metrics
Score English Fluency Semantic Correctness Sportscasting Ability
5 Flawless Always Excellent
4 Good Usually Good
3 Non-native Sometimes Average
2 Disfluent Rarely Bad
1 Gibberish Never Terrible
Human? Also asked human judge to predict if a
human or machine generated the sportscast,
knowing there was some of each in the data.
59
English Human Evaluation Results
60
Korean Human Evaluation Results
61
Future Direction 1
  • Grounded language learning for direction
    following in a virtual environments.
  • Eventual goal Virtual agents in video games and
    educational software that can take and give
    instructions in natural language.

62
Challenge on Generating Instructions in Virtual
Environments (GIVE)
http//www.give-challenge.org/research/
63
Learning Approach for Grounded Instructional
Language Learning
  • Passive learning
  • Observes human instructor guiding a human
    follower
  • Interactive learning as follower
  • Tries to follow human instructions
  • Interactive learning as instructor
  • Generates instructions to guide human follower

64
Future Direction 2Learning for Language and
Vision
  • Natural Language Processing (NLP) and Computer
    Vision (CV) are both very challenging problems.
  • Machine Learning (ML) is now extensively used to
    automate the construction of both effective NLP
    and CV systems.
  • Generally uses supervised ML and requires
    difficult and expensive human annotation of large
    text or image/video corpora for training.

65
Cross-Supervision of Language and Vision
  • Use naturally co-occurring perceptual input to
    supervise language learning.
  • Use naturally co-occurring linguistic input to
    supervise visual learning.

Blue cylinder on top of a red cube.
66
Activity Recognition in Video
  • Recognizing activities in video generally uses
    supervised learning trained on human-labeled
    video clips.
  • Linguistic information in closed captions (CCs)
    can be used as weak supervision for training
    activity recognizers.
  • Automatically trained activity recognizers can be
    used to improve precision of video retrieval.

67
Sample Soccer Videos
Save
Kick
I do not think there is any real intent, just
trying to make sure he gets his body across, but
it was a free kick .
Good save as well.
I think brown made a wonderful fingertip save
there.
Lovely kick.
And it is a really chopped save
Goal kick.
68
Throw
Touch
If you are defending a lead, your throw back
takes it that far up the pitch and gets a
throw-in.
All it needed was a touch.
When they are going to pass it in the back, it
is a really pure touch.
Another shot for a throw.
Look at that, Henry, again, he had time on the
ball to take another touch and prepare that ball
properly.
And Carlos Tevez has won the throw.
69
Conclusions
  • Current language learning work uses expensive,
    unrealistic training data.
  • We have developed language learning systems that
    can learn from sentences paired with an ambiguous
    perceptual environment.
  • We have evaluated it on learning to sportscast
    simulated Robocup games where it learns to
    commentate games about as well as humans.
  • Learning to connect language and perception is an
    important and exciting research problem.
Write a Comment
User Comments (0)
About PowerShow.com