DomainSpecific Sense Distributions and Predominant Sense Acquisition - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

DomainSpecific Sense Distributions and Predominant Sense Acquisition

Description:

Manually sense tagged data (SemCor) is used for WSD ... Create sense tagged data for a selection of 40 words for ... Sense-tagged corpora for 2 domains ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 43
Provided by: robko7
Category:

less

Transcript and Presenter's Notes

Title: DomainSpecific Sense Distributions and Predominant Sense Acquisition


1
Domain-Specific Sense Distributions and
Predominant Sense Acquisition
  • Rob Koeling Diana McCarthy John Carroll
  • University of Sussex University of Sussex
    University of Sussex

2
Overview
  • Motivation
  • Finding Predominant Senses
  • Creating the 3 Gold Standards
  • Predominant Sense Evaluation
  • Conclusions / Future Work

3
Motivation
  • Distributions of word senses are often highly
    skewed
  • Manually sense tagged data (SemCor) is used for
    WSD
  • Either as training data for statistical model
  • Or as back-off if primary model fails
  • First sense heuristic is powerful
  • 61.5 on Senseval-3 all words task
  • Information about the domain of a document can
    help WSD

4
Motivation
  • Consider the word goal (WordNet 1.7.1)
  • Synonyms/Hypernyms (Ordered by Estimated
    Frequency) of noun goal
  • 4 senses of goal
  • - Sense 1 goal, end, objective
  • content, cognitive content, mental
    object
  • - Sense 2 goal
  • score
  • - Sense 3 goal
  • game equipment
  • - Sense 4 destination, goal
  • point

5
Motivation
  • Sense distributions (for some words) are domain
    specific
  • Beneficial for WSD
  • No existing domain-specific sense tagged corpora
  • Automatic ranking applied to specific domains

6
Goal (mental object)
  • Create sense tagged data for a selection of 40
    words for different domains
  • Characterize the annotated data
  • Apply auto rank method to these domains and
    evaluate on sense tagged data

7
Finding Predominant Senses
  • Ingredients
  • Auto. created thesaurus (e.g. Lin 98)
  • goal aim (0.3), win (0.2), ., victory (0.1)
  • Sense inventory (e.g. WordNet)
  • 1) objective, 2) score, 3) game equipment, 4)
    destination
  • Sense similarity score (e.g. as defined in
    WordNet Similarity package)

8
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination
  • Calculate Semantic similarity score between First
    sense of goal (objective) and first neighbour
    (aim)

9
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination
  • Calculate Semantic similarity score between First
    sense of goal (objective) and first neighbour
    (aim)
  • Normalize and multiply with Distributional
    Similarity Score (0.3)

10
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination
  • Calculate Semantic similarity score between First
    sense of goal (objective) and first neighbour
    (aim)
  • Normalize and multiply with Distributional
    Similarity Score (0.3)
  • Repeat procedure for First sense and second
    neighbour third neighbour . Etc.

11
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination
  • Calculate Semantic similarity score between First
    sense of goal (objective) and first neighbour
    (aim)
  • Normalize and multiply with Distributional
    Similarity Score (0.3)
  • Repeat procedure for first sense and second
    neighbour third neighbour . Etc.
  • Add up all scores to compute Ranking Score for
    sense 1 (objective)

12
Calculating the Prevalence of Word Senses
GOAL
Neighbours aim (0.3), win (0.2), ., victory
(0.1) Senses 1) objective, 2) score, 3) game
equipment, 4) destination
  • Calculate Semantic similarity score between First
    sense of goal (objective) and first neighbour
    (aim)
  • Normalize and multiply with Distributional
    Similarity Score (0.3)
  • Repeat procedure for First sense and second
    neighbour third neighbour . Etc.
  • Add up all scores to compute Ranking Score for
    sense 1 (objective)
  • Repeat procedure for sense 2 (score), sense 3
    etc.

13
Finding Predominant sense
  • Ranking (Sports) for goal
  • Score (1.2345)
  • Objective (0.8345)
  • Destination (0.5434)
  • Game equipment (0.4536)

14
Finding Predominant sense
  • Ranking (Sports) for goal
  • Score (1.2345)
  • Objective (0.8345)
  • Destination (0.5434)
  • Game equipment (0.4536)
  • Ranking (Finance) for goal
  • Objective (1.3452)
  • Score (0.9450)
  • Destination (0.4374)
  • Game equipment (0.3536)

15
Creating the 3 Gold Standards
  • Used corpora
  • BNC (90M words mixed)
  • Reuters Finance (32M words)
  • Reuters Sports (9M words)
  • We computed thesauruses for each of these corpora

16
Creating the 3 Gold Standards
  • Word selection
  • 40 nouns Not completely random
  • 2 sets of words
  • Subject Field Codes (domain labels for WN1.6)
  • Domain salience

17
Subject Field Codes
  • 38 words have at least 1 sense labelled Sports
    and 1 labelled Finance
  • Not all usable. Three criteria
  • Frequency in BNC1000
  • At most 12 senses
  • At least 75 examples in each corpus

18
Subject Field Codes
  • Resulting set (17 words)
  • club, manager, record, right, bill, check,
    competition, conversion, crew, delivery,
    division, fishing, reserve, return, score,
    receiver, running
  • High frequency in BNC, mid frequency, low
    frequency

19
Domain salience
  • Resulting sets
  • Sport fan,star,transfer,striker,goal,
    title, tie,coach
  • Financepackage, chip, bond, market,
  • strike, bank, share, target
  • Equal will,phase, half, top,
    performance, level, country

20
The Annotation Task
  • Set up as an Open Mind Word Expert task
  • 10 annotators
  • 125 sentences randomly sampled from each corpus
  • Some noise filtered
  • First 100 selected
  • Most sentences triple annotated

21
(No Transcript)
22
Characterisation of the Annotated Data
  • 33225 tagging acts
  • 65 inter-annotator agreement

23
Sense Distributions
24
Sense Distributions
25
Sense Distributions
26
Predominant Sense Evaluation
  • Disambiguation using predominant sense

27
Predominant Sense Evaluation
  • Disambiguation using predominant sense

28
Predominant Sense Evaluation
  • Disambiguation using predominant sense

29
Predominant Sense Evaluation
  • Disambiguation using predominant sense

30
Predominant Sense Evaluation
  • Disambiguation using predominant sense

31
Predominant Sense Evaluation
  • Best results when trained on a domain relevant
    corpus
  • Random baseline and SemCor baseline are always
    comfortably beaten
  • For words that are pertinent to the domain, it
    pays to use domain specific training data

32
Discussion/Conclusions
33
Discussion/Conclusions
  • Sense-tagged corpora for 2 domains

34
Discussion/Conclusions
  • Sense-tagged corpora for 2 domains
  • Predominant sense is much more dominant in domain
    than in general case

35
Discussion/Conclusions
  • Sense-tagged corpora for 2 domains
  • Predominant sense is much more dominant in domain
    than in general case
  • Quantitative evaluation of auto rank
  • auto acquisition of predominant senses can
    outperform SemCor baseline

36
Discussion/Conclusions
  • Sense-tagged corpora for 2 domains
  • Predominant sense is much more dominant in domain
    than in general case
  • Quantitative evaluation of auto rank
  • auto acquisition of predominant senses can
    outperform SemCor baseline
  • Choosing the predominant sense will be hard to
    beat for some words within a specific domain

37
Discussion/Conclusions
  • Sense-tagged corpora for 2 domains
  • Predominant sense is much more dominant in domain
    than in general case
  • Quantitative evaluation of auto rank
  • auto acquisition of predominant senses can
    outperform SemCor baseline
  • Choosing the predominant sense will be hard to
    beat for some words within a specific domain
  • Others remain highly ambiguous

38
Discussion/Conclusions
  • When to select domain specific predominant sense?

39
Discussion/Conclusions
  • When to select domain specific predominant sense?
  • Look at differences in auto ranking domain
    corpus vs. balanced corpus

40
Discussion/Conclusions
  • When to select predominant sense for WSD?
  • Look at differences in auto ranking domain
    corpus vs. balanced corpus
  • E.g. different predominant sense

41
Future work
  • Best method for quantifying substantial change
  • Beyond predominant sense use the full ranking
    (sense distributions) for improved models for WSD
  • Influence of corpus size
  • Influence of noise (robustness)

42
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com