Semantics - PowerPoint PPT Presentation

About This Presentation
Title:

Semantics

Description:

Title: Semantics Author: srini Last modified by: srini Created Date: 11/11/2005 3:09:49 PM Document presentation format: On-screen Show Company: AT&T Labs Research – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 63
Provided by: Sri675
Category:

less

Transcript and Presenter's Notes

Title: Semantics


1
Semantics
2
Where are we in the Big Picture
ASR
Speech
Text
Morph Analysis
Syntactic Parse
Parsing
Semantic Interpreter
Semantic Representation
Inference Engine
WORLD of FACTS
3
Semantic Representation
  • Syntactic representation,
  • phrases and tree structures
  • dependency information between words
  • Semantic representation
  • Whats the purpose of this representation?
  • Interface between syntactic information and the
    inference engine
  • Requirements on the semantic representation
  • Supports inference
  • Every CEO is wealthy and Gates is a CEO ? Gates
    is wealthy
  • Normalizes syntactic variations
  • Delta serves NYC NYC is served by Delta
  • Has the capacity of representing the distinctions
    in language phenomena
  • John believes Delta serves NYC ? Delta serves NYC
  • Unambiguous representation
  • John wants to eat someplace close to the
    university

4
Mechanisms for Expressing Meaning
  • Linguistic means for expressing meaning
  • Words lexical semantics and word senses
  • Delta will serve NYC
  • This flight will serve peanuts
  • John will serve as CEO
  • Syntactic information predicate-argument
    structure
  • John wants to eat a cake
  • John wants Mary to eat a cake
  • John wants a cake
  • Prosodic information in Speech
  • Legumes are a good source of vitamins
  • Gesture information in multimodal communication

5
First-order Predicate Calculus A refresher
  • A formal system used to derive new propositions
    and verify their truth given a world.
  • Syntax of FOPC
  • Formulae quantifiers and connectives on
    predicates
  • Predicates n-ary predications of facts and
    relations
  • Terms constants, variables and functions
  • World Truth assignments to formulae
  • Inference
  • Modus ponens
  • Every CEO is wealthy ?x CEO(x) ? wealthy(x)
  • Gates is a CEO CEO(Gates)
  • Derives wealthy(Gates)
  • Given a world, determining the truth value of a
    formula is a search process backward chaining
    and forward chaining
  • Much like the top-down and bottom-up parsing
    algorithms.

6
Logic for Language
  • Representations for different aspects of
    language.
  • Entities
  • Delta, Gates, ATT
  • Categories
  • restaurants, airlines, students
  • Events
  • I ate lunch. I ate at my desk ? I ate lunch at my
    desk
  • Time (utterance time, reference time, event time)
  • I ate lunch when the flight arrived
  • I had eaten lunch when the flight arrived
  • Aspect
  • Stative, activity, achievement and accomplishment
  • Quantification
  • Every person loves some movie
  • Predication
  • John is a teacher
  • Modal operators
  • John believes Mary went to the movies

7
Linking Syntax and Semantics
  • How to compute semantic representations from
    syntactic trees?
  • We could have one function for each syntactic
    tree that maps it to its semantic representation.
  • Too many such functions
  • Not all aspects of the tree might be needed for
    its semantics
  • Meaning derives from
  • The people and activities represented (predicates
    and arguments, or, nouns and verbs)
  • The way they are ordered and related syntax of
    the representation, which may also reflect the
    syntax of the sentence
  • Compositionality Assumption The meaning of the
    whole sentence is composed of the meaning of its
    parts.
  • George cooks. Dan eats. Dan is sick.
  • Cook(George) Eat(Dan) Sick(Dan)
  • If George cooks and Dan eats, Dan will get sick.
  • (Cook(George) eat(Dan)) ? Sick(Dan)
  • The trick is to decide on what the size of the
    part should be.
  • Rule-by-rule hypothesis

8
Linking Syntax and Semantics contd.
  • Compositionality
  • Augment the lexicon and the grammar (as we did
    with feature structures)
  • Devise a mapping between rules of the grammar and
    rules of semantic representation
  • For CFGs, this amounts to a Rule-to-Rule
    Hypothesis
  • Each grammar rule is embellished with
    instructions on how to map the components of the
    rule to a semantic representation.
  • S ? NP VP VP.sem(NP.sem)
  • Each semantic function is defined in terms of the
    semantic representation of choice.

9
Syntax-Driven Semantics
  • There are still a few free parameters
  • What should the semantic representation of each
    component be?
  • How should we combine the component
    representations?
  • Depends on what the final representation we want.

10
A Simple Example
  • McDonalds serves burgers.
  • Associating constants with constituents
  • ProperNoun ? McDonalds McDonalds
  • PlNoun ? burgers burgers
  • Defining functions to produce these from input
  • NP ? ProperNoun ProperNoun.sem
  • NP ? PlNoun PlNoun.sem
  • Assumption meaning representations of children
    are passed up to parents for non-branching
    constituents
  • Verbs are where the action is

11
  • V ? serves ?(e,x,y) Isa(e,Serving)
    Server(e,x) Served(e,y) where e event, x
    agent, y patient
  • Will every verb have its own distinct
    representation?
  • McDonalds hires students.
  • McDonalds gave customers a bonus.
  • Predicate(Agent, Patient, Beneficiary)
  • Once we have the semantics for each constituent,
    how do we combine them?
  • VP ? V NP V.sem(NP.sem)
  • Goal for VP semantics E(e,x) Isa(e,Serving)
    Server(e,x) Served(e,burgers)
  • VP.sem must tell us
  • Which variables to be replaced by which arguments
  • How this replacement is done

12
Lambda Notation
  • Extension to First Order Predicate Calculus ?x
    P(x)
  • ? variable(s) FOPC expression in those
    variables
  • Lambda binding
  • Apply lambda-expression to logical terms to bind
    lambda-expressions parameters to terms (lambda
    reduction)
  • Simple process substitute terms for variables in
    lambda expression ?xP(x) (car) ? P(car)

13
Lambda Abstraction and Application
  • Abstraction Make variable in the body available
    for binding.
  • to external arguments provided by semantics of
    other constituents (e.g. NPs)
  • Application Substitute the bound variable with
    the value
  • Semantic attachment for
  • V ? serves V.sem(NP.sem)
  • ?(e,x,y) Isa(e,Serving) Server(e,y)
    Served(e,x) converts to the lambda expression
  • ?x ? (e,y) Isa(e,Serving) Server(e,y)
    Served(e,x)
  • Now x is available to be bound when V.sem is
    applied to NP.sem of direct object
    (V.sem(NP.sem))
  • ? application binds x to value of NP.sem
    (burgers)
  • Value of VP.sem becomes
  • ?(e,y) Isa(e,Serving) Server(e,y)
    Served(e,burgers)

14
Lambda Abstraction and Application contd.
  • Similarly, we need a semantic attachment for S?
    NP VP VP.sem(NP.sem) to add the subject NP to
    our semantic representation of McDonalds serves
    burgers
  • Back to V.sem for serves
  • We need another ?-abstraction in the value of
    VP.sem
  • Change semantic representation of V to include
    another argument to be bound later
  • V ? serves ?x ?y ?(e) Isa(e,Serving)
    Server(e,y) Served(e,x)
  • Value of VP.sem becomes
  • ?y ?(e) Isa(e,Serving) Server(e,y)
    Served(e,burgers)
  • Value of S.sem becomes
  • ?(e) Isa(e,Serving) Server(e,McDonalds)
    Served(e,burgers)

15
Several Complications
  • For example, terms can be complex
  • A restaurant serves burgers.
  • a restaurant ?x Isa(x,restaurant)
  • E e Isa(e,Serving) Server(e,lt ?x
    Isa(x,restaurant)gt) Served(e,burgers)
  • Allows quantified expressions to appear where
    terms can by providing rules to turn them into
    well-formed FOPC expressions
  • Issues of quantifier scope
  • Every restaurant serves burgers.
  • Every restaurant serves every burger.

16
  • Semantic representations for other constituents?
  • Adjective phrases
  • Happy people, cheap food, purple socks
  • intersective semantics
  • Nom ? Adj Nom ?x Nom.sem(x) Isa(x,Adj.sem)
  • Adj ? cheap Cheap
  • ?x Isa(x, Food) Isa(x,Cheap) works ok
  • But.fake gun? Local restaurant? Former friend?
    Would-be singer?
  • Ex Isa(x, Gun) Isa(x,Fake)

17
Doing Compositional Semantics
  • Incorporating compositional semantics into CFG
    requires
  • Right representation for each constituent based
    on the parts of that constituent (e.g. Adj)
  • Right representation for a category of
    constituents based on other grammar rules, making
    use of that constituent (e.g. V.sem)
  • This gives us a set of function-like semantic
    attachments incorporated into our CFG
  • E.g. Nom ? Adj Nom ?x Nom.sem(x)
    Isa(x,Adj.sem)
  • A number of formalisms that extend CFGs to allow
    larger compositionality domains.

18
Computing the Semantic Representation
  • Two approaches
  • Compute the semantic representation of each
    constituent as the parser progresses through the
    rules.
  • Semantic representations could be used to rule
    out parses
  • Wasted time in constructing semantics for unused
    constituents.
  • Let the parser complete the syntactic parse and
    then recover the semantic representation.
  • in a bottom-up traversal.
  • Issues of ambiguous syntactic representation
  • Packing ambiguity
  • Underspecified semantics.

19
Non-Compositional Language
  • Non-compositional modifiers fake, former, local
  • Metaphor
  • Youre the cream in my coffee. Shes the cream in
    Georges coffee.
  • The break-in was just the tip of the iceberg.
  • This was only the tip of Shirleys iceberg.
  • Idioms
  • The old man finally kicked the bucket.
  • The old man finally kicked the proverbial bucket.
  • Deferred reference The ham sandwich wants his
    check.
  • Solutions? Mix lexical items with special grammar
    rules? Or???

20
Lexical Semantics
  • Lexical Semantics

21
Thinking about Words Again
  • Lexeme an entry in the lexicon that includes
  • an orthographic representation
  • a phonological form
  • a symbolic meaning representation or sense
  • Some typical dictionary entries
  • Red (red) n the color of blood or a ruby
  • Blood (bluhd) n the red liquid that circulates
    in the heart, arteries and veins of animals

22
  • Right (rIt) adj located nearer the right hand
    esp. being on the right when facing the same
    direction as the observer
  • Left (left) adj located nearer to this side of
    the body than the right
  • Can we get semantics directly from online
    dictionary entries?
  • Some are circular
  • All are defined in terms of other lexemes
  • You have to know something to learn something
  • What can we learn from dictionaries?
  • Relations between words
  • Oppositions, similarities, hierarchies

23
Homonomy
  • Homonyms Words with same form orthography and
    pronunciation -- but different, unrelated
    meanings, or senses (multiple lexemes)
  • A bank holds investments in a custodial account
    in the clients name.
  • As agriculture is burgeoning on the east bank,
    the river will shrink even more
  • Word sense disambiguation what clues?
  • Similar phenomena
  • homophones - read and red
  • same pronunciation/different orthography
  • homographs - bass and bass
  • same orthography/different pronunciation

24
Ambiguity Which applications will these cause
problems for?
  • A bass, the bank, /red/
  • General semantic interpretation
  • Machine translation
  • Spelling correction
  • Speech recognition
  • Text to speech
  • Information retrieval

25
Polysemy
  • Word with multiple but related meanings (same
    lexeme)
  • They rarely serve red meat.
  • He served as U.S. ambassador.
  • He might have served his time in prison.
  • Whats the difference between polysemy and
    homonymy?
  • Homonymy
  • Distinct, unrelated meanings
  • Different etymology? Coincidental similarity?

26
  • Polysemy
  • Distinct but related meanings
  • idea bank, sperm bank, blood bank, bank bank
  • How different?
  • Different subcategorization frames?
  • Domain specificity?
  • Can the two candidate senses be conjoined?
  • ?He served his time and as ambassador to Norway.
  • For either, practical task
  • What are its senses? (related or not)
  • How are they related? (polysemy easier here)
  • How can we distinguish them?

27
Tropes, or Figures of Speech
  • Metaphor one entity is given the attributes of
    another (tenor/vehicle/ground)
  • Life is a bowl of cherries. Dont take it
    serious.
  • We are the eyelids of defeated caves. ??
  • Metonymy one entity used to stand for another
    (replacive)
  • GM killed the Fiero.
  • The ham sandwich wants his check.
  • Both extend existing sense to new meaning
  • Metaphor completely different concept
  • Metonymy related concepts

28
Synonymy
  • Substitutability different lexemes, same meaning
  • How big is that plane?
  • How large is that plane?
  • How big are you? Big brother is watching.
  • What influences substitutability?
  • Polysemy (large vs. old sense)
  • register Hes really cheap/?parsimonious.
  • collocational constraints
  • roast beef, ?baked beef
  • economy fare ?economy price

29
Finding Synonyms and Collocations Automatically
from a Corpus
  • Synonyms Identify words appearing frequently in
    similar contexts
  • Blast victims were helped by civic-minded
    passersby.
  • Few passersby came to the aid of this crime
    victim.
  • Collocations Identify synonyms that dont appear
    in some specific similar contexts
  • Flu victims, flu suffers,
  • Crime victims, ?crime sufferers,

30
Hyponomy
  • General hypernym (superordinate)
  • dog is a hypernym of poodle
  • Specific hyponym (under..neath)
  • poodle is a hyponym of dog
  • Test That is a poodle implies that is a dog
  • Ontology set of domain objects
  • Taxonomy? Specification of relations between
    those objects
  • Object hierarchy? Structured hierarchy that
    supports feature inheritance (e.g. poodle
    inherits some properties of dog)

31
Semantic Networks
  • Used to represent lexical relationships
  • e.g. WordNet (George Miller et al)
  • Most widely used hierarchically organized lexical
    database for English
  • Synset set of synonyms, a dictionary-style
    definition (or gloss), and some examples of uses
    --gt a concept
  • Databases for nouns, verbs, and modifiers
  • Applications can traverse network to find
    synonyms, antonyms, hierarchies,...
  • Available for download or online use
  • http//www.cogsci.princeton.edu/wn

32
Using WN, e.g. in Question-Answering
  • Pasca Harabagiu 01 results on TREC corpus
  • Parses questions to determine question type, key
    words (Who invented the light bulb?)
  • Person question invent, light, bulb
  • The modern world is an electrified world. It
    might be argued that any of a number of
    electrical appliances deserves a place on a list
    of the millennium's most significant inventions.
    The light bulb, in particular, profoundly changed
    human existence by illuminating the night and
    making it hospitable to a wide range of human
    activity. The electric light, one of the everyday
    conveniences that most affects our lives, was
    invented in 1879 simultaneously by Thomas Alva
    Edison in the United States and Sir Joseph Wilson
    Swan in England.
  • Finding named entities is not enough

33
  • Compare expected answer type to potential
    answers
  • For questions of type person, expect answer is
    person
  • Identify potential person names in passages
    retrieved by IR
  • Check in WN to find which of these are hyponyms
    of person
  • Or, Consider reformulations of question Who
    invented the light bulb
  • For key words in query, look for WN synonyms
  • E.g. Who fabricated the light bulb?
  • Use this query for initial IR
  • Results improve system accuracy by 147 (on some
    question types)

34
Thematic Roles
  • ? w,x,y,z Giving(x) Giver(w,x) Givee(z, x)
    Given(y,x)
  • A set of roles for each event
  • Agent volitional causer -- John hit Bill.
  • Experiencer experiencer of event Bill got a
    headache.
  • Force non-volitional causer The concrete block
    struck Bill on the head.
  • Theme/patient most affected participant John
    hit Bill.
  • Result end product Bill got a headache.
  • Content proposition of propositional event
    Bill thought he should take up martial arts.

35
  • Instrument instrument used -- John hit Bill
    with a bat
  • Beneficiary qui bono John hit Bill to avenge
    his friend
  • Source origin of object of transfer event Bill
    fled from New York to Timbuktu
  • Goal destination of object -- Bill led from New
    York to Timbuktu
  • But there are a lot of verbs, with a lot of
    frames
  • Framenet encoded frames for many verb categories

36
Thematic Roles and Selectional Restrictions
  • Selectional restrictions semantic constraint
    that a word (lexeme) imposes on the concepts that
    go with it
  • George hit Bill with
  • .John/a gun/gusto.
  • Jim killed his philodendron/a fly/Bill.
  • ?His philodendron killed Jim.
  • The flu/Misery killed Jim.

37
Thematic Roles/Selectional Restrictions
  • In practical use
  • Given e.g. a verb and a corpus (plus FrameNet)
  • What conceptual roles are likely to accompany it?
  • What lexemes are likely to fill those roles?
  • Assassinate
  • Give
  • Imagine
  • Fall
  • Serve

38
Schank's Conceptual Dependency
  • Eleven predicate primitives represent all
    predicates
  • Objects decomposed into primitive categories and
    modifiers
  • But few predicates result in very complex
    representations of simple things
  • ?x,y Atrans(x) Actor(x,John) Object(x,Book)
    To(x,Mary) Ptrans(y) Actor(y,John)
    Object(y,Book) To(y,Mary)
  • John caused Mary to die vs. John killed Mary

39
Robust Semantics, Information Extraction, and
Information Retrieval
40
Problems with Syntax-Driven Semantics
  • Compositionality
  • Expects correspondence between syntactic and
    semantic structures.
  • Mismatch between syntactic structures and
    semantic structures certainly not rule-to-rule.
    (inadequacy of CFGs)
  • I like soup. Soup is what I like.
  • Constituent trees contain many structural
    elements not clearly important to making semantic
    distinctions
  • Resort to dependency trees.
  • Too abstract Syntax driven semantic
    representations are sometimes very abstract.
  • Nominal ? Adjective Nominal ?x Nominal.sem(x)
    AM(x,Adj.sem)
  • Cheap restaurant, Italian restaurant, local
    restaurant
  • Robust Semantic processing Trade-off
  • Portability
  • Expressivity

41
Semantic Grammars
  • Before
  • CFG with syntactic categories with
  • semantic representation composition overlaid.
  • Now
  • CFG with domain-specific semantic categories
  • Domain specific Rules correspond directly to
    entities and activities in the domain
  • I want to go from Boston to Baltimore on
    Thursday, September 24th
  • Greeting ? HelloHiUm
  • TripRequest ? Need-spec travel-verb from City to
    City on Date
  • Note Semantic grammars are still CFGs.

42
Pros and Cons of Semantic Grammars
  • Semantic grammars encode task knowledge and
    constrains the range of possible user input.
  • I want to go to Boston on Thursday.
  • I want to leave from there on Friday for
    Baltimore.
  • TripRequest ? Need-spec travel-verb from City on
    Date for City
  • The semantic representation is a slot-filler
    frame-like representation crafted for that
    domain.
  • Portability Lack of generality
  • A new one for each application
  • Large cost in development time
  • Robustness If users go outside the grammar,
    things may break disastrously
  • I want to go from ah to Boston from Newark
  • Expressivity
  • I want to go to Boston from Newark or New York

43
Information Extraction
  • Another robust alternative
  • Idea extract particular types of information
    from arbitrary text or transcribed speech
  • Examples
  • Named entities people, places, organizations,
    times, dates
  • ltOrganizationgt MIPSlt/Organizationgt Vice President
    ltPersongtJohn Himelt/Persongt
  • MUC evaluations
  • Domains Medical texts, broadcast news (terrorist
    reports), company mergers, customer care
    voicemail,...

44
Appropriate where Semantic Grammars and
Syntactic Parsers are Not
  • Appropriate where information needs very specific
    and specifiable in advance
  • Question answering systems, gisting of news or
    mail
  • Job ads, financial information, terrorist attacks
  • Input too complex and far-ranging to build
    semantic grammars
  • But full-blown syntactic parsers are impractical
  • Too much ambiguity for arbitrary text
  • 50 parses or none at all
  • Too slow for real-time applications

45
Information Extraction Techniques
  • Often use a set of simple templates or frames
    with slots to be filled in from input text
  • Ignore everything else
  • My number is 212-555-1212.
  • The inventor of the wiggleswort was Capt. John T.
    Hart.
  • The king died in March of 1932.
  • Generative Model
  • POS-style HMM model (with novel encoding)
  • The/O king/O died/O in/O March/I of/I 1932/I in/O
    France/O
  • T argmaxT P(WT) P(T)
  • Context
  • neighboring words, capitalization, punctuation
    can be used as well.

46
Discriminative Disambiguation Techniques
  • Large set of features makes MLE estimation of the
    parameters unreliable.
  • P(TW) p P(ti W, POS, Ortho)
  • P(ti wi-kwik, posi-kposik,
    orthoi)
  • Direct approach
  • F (ti ,wi-kwik, posi-kposik, orthoi) F(y,X)
  • F(y,X)
  • Maximum Entropy Markov Models, Conditional
    Random Fields

47
ScanMail Transcription
gender F age A caller_name NA native_speaker
N speech_pathology N sample_rate 8000 label 0
804672 " Greeting hi R__ CallerID it's me
give me a call um right away cos there's
.hn I guess there's some .hn change Date
tomorrow with the nursery school and they um
.hn anyway they had this idea cos since
I think J__'s the only one staying Date
tomorrow for play club so they wanted to they
suggested that .hn well J2__actually offered
to take J__home with her and then would she would
meet you back at the synagogue at Time five
thirty to pick her up .hn uh so I don't
know how you feel about that otherwise Miriam and
one other teacher would stay and take care of her
till Date five thirty tomorrow but if you
.hn I wanted to know how you feel before I tell
her one way or the other so call me .hn right
away cos I have to get back to her in about an
hour so .hn okay Closing bye .nhn
.onhk " duration "50.3 seconds"
48

SCANMail Access Devices
PC Pocket PC Dataphone Voice Phone Flash E-mail
49
Word Sense Disambiguation
  • Word Sense Disambiguation

50
Disambiguation via Selectional Restrictions
  • A step toward semantic parsing
  • Different verbs select for different thematic
    roles
  • wash the dishes (takes washable-thing as patient)
  • serve delicious dishes (takes food-type as
    patient)
  • Method rule-to-rule syntactico-semantic analysis
  • Semantic attachment rules are applied as
    sentences are syntactically parsed
  • VP --gt V NP
  • V? serve ltthemegt themefood-type
  • Selectional restriction violation no parse

51
  • Requires
  • Write selectional restrictions for each sense of
    each predicate or use FrameNet
  • serve alone has 15 verb senses
  • Hierarchical type information about each argument
    (a la WordNet)
  • How many hypernyms does dish have?
  • How many lexemes are hyponyms of dish?
  • But also
  • Sometimes selectional restrictions dont restrict
    enough (Which dishes do you like?)
  • Sometimes they restrict too much (Eat dirt, worm!
    Ill eat my hat!)

52
Can we take a more statistical approach?
  • How likely is dish/crockery to be the object of
    serve? dish/food?
  • A simple approach (baseline) predict the most
    likely sense
  • Why might this work?
  • When will it fail?
  • A better approach learn from a tagged corpus
  • What needs to be tagged?
  • An even better approach Resniks selectional
    association (1997, 1998)
  • Estimate conditional probabilities of word senses
    from a corpus tagged only with verbs and their
    arguments (e.g. ragout is an object of served --
    Jane served/V ragout/Obj

53
  • How do we get the word sense probabilities?
  • For each verb object (e.g. ragout)
  • Look up hypernym classes in WordNet
  • Distribute credit for this object sense
    occurring with this verb among all the classes to
    which the object belongs
  • Brian served/V the dish/Obj
  • Jane served/V food/Obj
  • If ragout has N hypernym classes in WordNet, add
    1/N to each class count (including food) as
    object of serve
  • If tureen has M hypernym classes in WordNet, add
    1/M to each class count (including dish) as
    object of serve
  • Pr(Classv) is the count(c,v)/count(v)
  • How can this work?
  • Ambiguous words have many superordinate classes
  • John served food/the dish/tuna/curry
  • There is a common sense among these which gets
    credit in each instance, eventually dominating
    the likelihood score

54
  • To determine most likely sense of bass in Bill
    served bass
  • Having previously assigned credit for the
    occurrence of all hypernyms of things like fish
    and things like musical instruments to all their
    hypernym classes (e.g. fish and musical
    instruments)
  • Find the hypernym classes of bass (including
    fish and musical instruments)
  • Choose the class C with the highest probability,
    given that the verb is serve
  • Results
  • Baselines
  • random choice of word sense is 26.8
  • choose most frequent sense (NB requires
    sense-labeled training corpus) is 58.2
  • Resniks 44 correct with only pred/arg
    relations labeled

55
Machine Learning Approaches
  • Learn a classifier to assign one of possible word
    senses for each word
  • Acquire knowledge from labeled or unlabeled
    corpus
  • Human intervention only in labeling corpus and
    selecting set of features to use in training
  • Input feature vectors
  • Target (dependent variable)
  • Context (set of independent variables)
  • Output classification rules for unseen text

56
Supervised Learning
  • Training and test sets with words labeled as to
    correct sense (It was the biggest fish bass
    Ive seen.)
  • Obtain values of independent variables
    automatically (POS, co-occurrence information, )
  • Run classifier on training data
  • Test on test data
  • Result Classifier for use on unlabeled data

57
Input Features for WSD
  • POS tags of target and neighbors
  • Surrounding context words (stemmed or not)
  • Punctuation, capitalization and formatting
  • Partial parsing to identify thematic/grammatical
    roles and relations
  • Collocational information
  • How likely are target and left/right neighbor to
    co-occur
  • Co-occurrence of neighboring words
  • Intuition How often does sea or words with bass

58
  • How do we proceed?
  • Look at a window around the word to be
    disambiguated, in training data
  • Which features accurately predict the correct
    tag?
  • Can you think of other features might be useful
    in general for WSD?
  • Input to learner, e.g.
  • Is the bass fresh today?
  • w-2, w-2/pos, w-1,w-/pos,w1,w1/pos,w2,w2/pos
  • is,V,the,DET,fresh,RB,today,N...

59
Types of Classifiers
  • Naïve Bayes
  • s p(sV), or
  • Where s is one of the senses possible and V the
    input vector of features
  • Assume features independent, so probability of V
    is the product of probabilities of each feature,
    given s, so
  • and p(V) same for any s
  • Then

60
Rule Induction Learners (e.g. Ripper)
  • Given a feature vector of values for independent
    variables associated with observations of values
    for the training set (e.g. fishing,NP,3,
    bass2)
  • Produce a set of rules that perform best on the
    training data, e.g.
  • bass2 if w-1fishing posNP

61
Decision Lists
  • Like case statements applying tests to input in
    turn
  • fish within window --gt bass1
  • striped bass --gt bass1
  • guitar within window --gt bass2
  • bass player --gt bass1
  • Ordering based on individual accuracy on entire
    training set based on log-likelihood ratio

62
  • Bootstrapping I
  • Start with a few labeled instances of target item
    as seeds to train initial classifier, C
  • Use high confidence classifications of C on
    unlabeled data as training data
  • Iterate
  • Bootstrapping II
  • Start with sentences containing words strongly
    associated with each sense (e.g. sea and music
    for bass), either intuitively or from corpus or
    from dictionary entries
  • One Sense per Discourse hypothesis

63
Unsupervised Learning
  • Cluster feature vectors to discover word senses
    using some similarity metric (e.g. cosine
    distance)
  • Represent each cluster as average of feature
    vectors it contains
  • Label clusters by hand with known senses
  • Classify unseen instances by proximity to these
    known and labeled clusters
  • Evaluation problem
  • What are the right senses?
  • Cluster impurity
  • How do you know how many clusters to create?
  • Some clusters may not map to known senses

64
Dictionary Approaches
  • Problem of scale for all ML approaches
  • Build a classifier for each sense ambiguity
  • Machine readable dictionaries (Lesk 86)
  • Retrieve all definitions of content words
    occurring in context of target (e.g. the happy
    seafarer ate the bass)
  • Compare for overlap with sense definitions of
    target entry (bass2 a type of fish that lives in
    the sea)
  • Choose sense with most overlap
  • Limits Entries are short --gt expand entries to
    related words
Write a Comment
User Comments (0)
About PowerShow.com