DomainSpecific Sense Distributions and Predominant Sense Acquisition - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

DomainSpecific Sense Distributions and Predominant Sense Acquisition

Description:

Manually sense tagged data (SemCor) is used for WSD ... Create sense tagged data for a selection of 40 words for ... Sense-tagged corpora for 2 domains ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 43

Provided by: robko7

Category:

more less

Transcript and Presenter's Notes

Title: DomainSpecific Sense Distributions and Predominant Sense Acquisition

1
Domain-Specific Sense Distributions and
Predominant Sense Acquisition

Rob Koeling Diana McCarthy John Carroll
University of Sussex University of Sussex
University of Sussex

2
Overview

Motivation
Finding Predominant Senses
Creating the 3 Gold Standards
Predominant Sense Evaluation
Conclusions / Future Work

3
Motivation

Distributions of word senses are often highly
skewed
Manually sense tagged data (SemCor) is used for
WSD
Either as training data for statistical model
Or as back-off if primary model fails
First sense heuristic is powerful
61.5 on Senseval-3 all words task
Information about the domain of a document can
help WSD

4
Motivation

Consider the word goal (WordNet 1.7.1)
Synonyms/Hypernyms (Ordered by Estimated
Frequency) of noun goal
4 senses of goal
- Sense 1 goal, end, objective
content, cognitive content, mental
object
- Sense 2 goal
score
- Sense 3 goal
game equipment
- Sense 4 destination, goal
point

5
Motivation

Sense distributions (for some words) are domain
specific
Beneficial for WSD
No existing domain-specific sense tagged corpora
Automatic ranking applied to specific domains

6
Goal (mental object)

Create sense tagged data for a selection of 40
words for different domains
Characterize the annotated data
Apply auto rank method to these domains and
evaluate on sense tagged data

7
Finding Predominant Senses

Ingredients
Auto. created thesaurus (e.g. Lin 98)
goal aim (0.3), win (0.2), ., victory (0.1)
Sense inventory (e.g. WordNet)
1) objective, 2) score, 3) game equipment, 4)
destination
Sense similarity score (e.g. as defined in
WordNet Similarity package)

8
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination

Calculate Semantic similarity score between First
sense of goal (objective) and first neighbour
(aim)

9
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination

Calculate Semantic similarity score between First
sense of goal (objective) and first neighbour
(aim)
Normalize and multiply with Distributional
Similarity Score (0.3)

10
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination

Calculate Semantic similarity score between First
sense of goal (objective) and first neighbour
(aim)
Normalize and multiply with Distributional
Similarity Score (0.3)
Repeat procedure for First sense and second
neighbour third neighbour . Etc.

11
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination

Calculate Semantic similarity score between First
sense of goal (objective) and first neighbour
(aim)
Normalize and multiply with Distributional
Similarity Score (0.3)
Repeat procedure for first sense and second
neighbour third neighbour . Etc.
Add up all scores to compute Ranking Score for
sense 1 (objective)

12
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination

Calculate Semantic similarity score between First
sense of goal (objective) and first neighbour
(aim)
Normalize and multiply with Distributional
Similarity Score (0.3)
Repeat procedure for First sense and second
neighbour third neighbour . Etc.
Add up all scores to compute Ranking Score for
sense 1 (objective)
Repeat procedure for sense 2 (score), sense 3
etc.

13
Finding Predominant sense

Ranking (Sports) for goal
Score (1.2345)
Objective (0.8345)
Destination (0.5434)
Game equipment (0.4536)

14
Finding Predominant sense

Ranking (Sports) for goal
Score (1.2345)
Objective (0.8345)
Destination (0.5434)
Game equipment (0.4536)
Ranking (Finance) for goal
Objective (1.3452)
Score (0.9450)
Destination (0.4374)
Game equipment (0.3536)

15
Creating the 3 Gold Standards

Used corpora
BNC (90M words mixed)
Reuters Finance (32M words)
Reuters Sports (9M words)
We computed thesauruses for each of these corpora

16
Creating the 3 Gold Standards

Word selection
40 nouns Not completely random
2 sets of words
Subject Field Codes (domain labels for WN1.6)
Domain salience

17
Subject Field Codes

38 words have at least 1 sense labelled Sports
and 1 labelled Finance
Not all usable. Three criteria
Frequency in BNC1000
At most 12 senses
At least 75 examples in each corpus

18
Subject Field Codes

Resulting set (17 words)
club, manager, record, right, bill, check,
competition, conversion, crew, delivery,
division, fishing, reserve, return, score,
receiver, running
High frequency in BNC, mid frequency, low
frequency

19
Domain salience

Resulting sets
Sport fan,star,transfer,striker,goal,
title, tie,coach
Financepackage, chip, bond, market,
strike, bank, share, target
Equal will,phase, half, top,
performance, level, country

20
The Annotation Task

Set up as an Open Mind Word Expert task
10 annotators
125 sentences randomly sampled from each corpus
Some noise filtered
First 100 selected
Most sentences triple annotated

21
(No Transcript)
22
Characterisation of the Annotated Data

33225 tagging acts
65 inter-annotator agreement

23
Sense Distributions
24
Sense Distributions
25
Sense Distributions
26
Predominant Sense Evaluation

Disambiguation using predominant sense

27
Predominant Sense Evaluation

Disambiguation using predominant sense

28
Predominant Sense Evaluation

Disambiguation using predominant sense

29
Predominant Sense Evaluation

Disambiguation using predominant sense

30
Predominant Sense Evaluation

Disambiguation using predominant sense

31
Predominant Sense Evaluation

Best results when trained on a domain relevant
corpus
Random baseline and SemCor baseline are always
comfortably beaten
For words that are pertinent to the domain, it
pays to use domain specific training data

32
Discussion/Conclusions
33
Discussion/Conclusions

Sense-tagged corpora for 2 domains

34
Discussion/Conclusions

Sense-tagged corpora for 2 domains
Predominant sense is much more dominant in domain
than in general case

35
Discussion/Conclusions

Sense-tagged corpora for 2 domains
Predominant sense is much more dominant in domain
than in general case
Quantitative evaluation of auto rank
auto acquisition of predominant senses can
outperform SemCor baseline

36
Discussion/Conclusions

Sense-tagged corpora for 2 domains
Predominant sense is much more dominant in domain
than in general case
Quantitative evaluation of auto rank
auto acquisition of predominant senses can
outperform SemCor baseline
Choosing the predominant sense will be hard to
beat for some words within a specific domain

37
Discussion/Conclusions

Sense-tagged corpora for 2 domains
Predominant sense is much more dominant in domain
than in general case
Quantitative evaluation of auto rank
auto acquisition of predominant senses can
outperform SemCor baseline
Choosing the predominant sense will be hard to
beat for some words within a specific domain
Others remain highly ambiguous

38
Discussion/Conclusions

When to select domain specific predominant sense?

39
Discussion/Conclusions

When to select domain specific predominant sense?
Look at differences in auto ranking domain
corpus vs. balanced corpus

40
Discussion/Conclusions

When to select predominant sense for WSD?
Look at differences in auto ranking domain
corpus vs. balanced corpus
E.g. different predominant sense

41
Future work

Best method for quantifying substantial change
Beyond predominant sense use the full ranking
(sense distributions) for improved models for WSD
Influence of corpus size
Influence of noise (robustness)

42
Thank you!

Write a Comment

User Comments (0)