CPSC 503 Computational Linguistics

About This Presentation

Title:

CPSC 503 Computational Linguistics

Description:

bass - (the member with the lowest range of a family of musical instruments) bass -(nontechnical name for any of numerous edible marine and freshwater spiny ... –

Number of Views:50

Avg rating:3.0/5.0

Slides: 34

Provided by: giuseppe7

Category:

more less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics

1
CPSC 503Computational Linguistics

Word-Sense Disambiguation
Information Retrieval
Lecture 18
Giuseppe Carenini

2
Semantics Summary

What meaning is and how to represent it
How to map sentences into their meaning
Meaning of individual words
Tasks
Information Extraction
Word Sense Disambiguation
Information Retrieval

3
Today 24/3

Word-Sense Disambiguation
Machine Learning Approaches
Information Retrieval (ad hoc)

4
Supervised ML Approaches to WSD
5
Training Data Example
((word context) ? sense)i

..after the soup she had bass with a big salad

6
WordNet Bass music vs. fish

The noun bass'' has 8 senses in WordNet
bass - (the lowest part of the musical range)
bass, bass part - (the lowest part in polyphonic
music)
bass, basso - (an adult male singer with )
sea bass, bass - (flesh of lean-fleshed saltwater
fish of the family Serranidae)
freshwater bass, bass - (any of various North
American lean-fleshed )
bass, bass voice, basso - (the lowest adult male
singing voice)
bass - (the member with the lowest range of a
family of musical instruments)
bass -(nontechnical name for any of numerous
edible marine and freshwater spiny-finned
fishes)

7
Representations for Context

GOAL Informative characterization of the window
of text surrounding the target word

TASK Select relevant linguistic information,
encode them as a feature vector

8
Relevant Linguistic Information(1)

Collocational info about the words that appear
in specific positions to the right and left of
the target word

Typically words and their POS
word in position -n, part-of-speech position -n,
word in position n, part-of-speech position
n,
Assume a window of /- 2 from the target

Example text (WSJ)
An electric guitar and bass player stand off to
one side not really part of the scene,

guitar, NN, and, CJC, player, NN, stand, VVB
9
Relevant Linguistic Information(2)

Co-occurrence info about the words that occur
anywhere in the window regardless of position

Find k content words that most frequently
co-occur with target in corpus (for bass
fishing, big, sound, player, fly , guitar, band))

Vector for one case c(fishing), c(big),
c(sound), c(player), c(fly), , c(guitar),
c(band)

Example text (WSJ)
An electric guitar and bass player stand off to
one side not really part of the scene,

0,0,0,1,0,0,0,0,0,0,1,0
10
ML for Classifiers

Training Data
Co-occurrence
Collocational

Naïve Bayes
Decision lists
Decision trees
Neural nets
Support vector machines
Nearest neighbor methods

Machine Learning
Classifier
11
Naïve Bayes
12
Naïve Bayes Evaluation

Experiment comparing different classifiers
Mooney 96
Naïve Bayes and Neural Network achieved highest
performance
73 in assigning one of six senses to line

13
Bootstrapping

What if you dont have enough data to train a
system

14
Bootstrapping how to pick the seeds

Hand-labeling
Likely correct
Likely to be prototypical
One sense per collocation search for words or
phrases strongly associates with target senses.
Then automatic labeling.

E.g., bass
play is strongly associated with the music sense
whereas fish is strongly associated the fish sense

15
Unsupervised Methods Schultze 98
Training Data
Machine Learning
(word vector)1 (word vector)n
K Clusters ci
16
Agglomerative Clustering

Assign each instance to its own cluster
Repeat
Merge the two clusters that are more similar
Until (specified of clusters is reached)

If there are too many training instances -gtrandom
sampling

17
Problems

Given these general ML approaches, how many
classifiers do I need to perform WSD robustly
One for each ambiguous word in the language

How do you decide what set of tags/labels/senses
to use for a given word?
Depends on the application

18
Recent Work on WSD

Word Sense Disambiguation Recent Successes and
Future Directions
A SIGLEX/SENSEVAL Workshop at ACL 2002 University
of Pennsylvania

19
Today 24/3

Word-Sense Disambiguation
Machine Learning Approaches
Information Retrieval (ad hoc)

20
Information Retrieval

Retrieving relevant documents from document
repositories
Sub-Areas
Ad hoc retrieval (Query-gt List of documents)

Text Categorization (Document -gt Category)
Eg BusinessNews (OIL, ACQ, )

Filtering (special case of TC, with 2 categories
- relevant/non-relevant)

21
Information Retrieval

Bag of words assumption in modern IR the
meanings of documents is captured by analyzing
(counting) the words that occur in them.
Efficiency
Works in practice

Tobias Scheffer and Stefan Wrobel. Text
classification beyond the bag-of-words
representation In Proceedings of the
ICML-Workshop on Text Learning. 2002.
22
IR Terminology

Documents
Any contiguous bunch of text (E.g. News article,
Web page, paragraph)

Collection
A bunch of documents

Terms
Words that occur in a collection (but it may
include common phrases E.g. car insurance)

Query
Terms that express an information need

23
Terms Selection and Creation

Stop list? a list of frequent largely
content-free words that are not considered (of,
the, a, to, etc.)

Stemming? Are terms stems or words?
Eg. Are dog and dogs separate terms or are they
collapsed to dog?

Phrases? Include most frequent biagrams as
phrases

24
Ad hoc Ranked Retrieval
Documents in collection ranked by relevance

query

d1 d2 dM
What should a t express?
25
First approximation bit vector

ti 1 if the corresponding word type occurs in
the document ( ti 0 otherwise )

Is this a satisfying solution?
26
Better Term Weighting

Local weight How important is this term to the
meaning of this document

Global weight How well does this term
discriminate among the documents in the collection

The more documents a term occurs in the less
important it is

SOLUTION combine Local and Global

27
New Similarity the cosine measure
normalized
28
Ad Hoc Retrieval Summary

Given a users query find all the documents that
contain any of the terms in the query

Why only those documents?

Convert the query to a vector

Compute the cosine between the query vector and
all the candidate documents and sort

29
IR Evaluation (1)

What do we want?
We want documents relevant to the query to be
near the top of the list

d1 d2 dM

Use a test collection where you have
A set of documents
A set of queries
A set of relevance judgments that tell you which
documents are relevant to each query

30
IR Evaluation (2)