Designing Statistical Language Learners: Experiments on Noun Compounds - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Designing Statistical Language Learners: Experiments on Noun Compounds

Description:

Material Covered in Chapters 1-4 and from other ... [animal cruelty] committee] [woman [customs official] ... animal cruelty' combined modify 'committee' ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 33

Provided by: roda8

Category:

more less

Transcript and Presenter's Notes

Title: Designing Statistical Language Learners: Experiments on Noun Compounds

1
Designing Statistical Language LearnersExperimen
ts on Noun Compounds
by Mark Lauer 1995 PhD Thesis, Macquarie
University

presented by Rod Adams

2
Outline

Background Information
Part I Syntactic Analysis
Part II Semantic Analysis
Recent Efforts

3
Background Information

Material Covered in Chapters 1-4 and from other
sources, essential for the understanding of the
Chapter 5 Experiments.

4
What is a Compound Noun?

A compound noun is any consecutive sequence of
nouns at least two words in length that functions
as a noun
police officer, car park, bread knife,
radio telescope platform coordination computer
software test
Often represent a shortened form of a longer
expression.
officer in the police department, place
intended for the parking of cars, knife
designed specifically to cut bread, a test for
the software, which is run on a computer, which
coordinates the platform that houses the
telescope that makes use of radio waves.

5
Three Forms of CNs

Open Form
dining room, disaster supplies, fish tank
Hyphenated Form
sky-scraper, secretary-general, hanger-on
Closed Form
boyfriend, baseball, motorcycle
Typically, only the open form is studied in
research, since the other two tend to be
lexicalized.

6
Frequency

CNs are used very frequently, and are highly
productive in modern language.
Roughly 1 CN in 5 sentences in modern fictional
prose, with about an 86 chance of encountering a
previously unseen CN type. This is commonly
accepted as the low end of CN frequency and
generation.
News articles typically have a new CN type in
every second sentence.
The abstracts from 293 technical articles
contained 3,514 distinct types, or 12 distinct
types per abstract.

7
Frequency (continued)

Cannot be simply cataloged and analyzed.
Too many in existence to be practical
New ones get produced too regularly to keep
maintained.
Must have automated tools for processing.
However, some are so common or their meaning so
removed from the composite words that they have
been added to the language lexicon, aka
lexicalized.
soap opera, scum bag, post office

8
Types of CNs

Nominalizations The head, or rightmost word, is
a nominalized form of a verb.
stamp collection, marathon running
a.k.a. Verbal-Nexus Compounds, Sentence
Compounds, Sortal Nouns.
Non-Verbal-Nexus Those not formed with a
nominalized head.
a.k.a. Deleted Predicate Nominals, Relational
Nominals
Copulative A subclass of Non-Verbal-Nexus CNs
where the modifier is a special type of the head
word.
tuna fish, submarine ship

9
Meaning Distribution Theory

Chapter 3 focuses on Lauers defense for using a
predominately semantic view of language, which
was not popular at that time. Basic tenets
Communication is the expression of a combination
of smaller primitive elements.
Analysis of text is governed by the expected
meaning of a given text given all the possible
meanings.
A full understanding is not required to
understand the experiments, just some of the
motivations.

10
Experiments, Part ISyntactic Parsing
11
Parsing

Given a compound, determine the correct parse
tree, or equivalently, the bracketing of the
composite nouns.
animal cruelty committee
woman customs official
chocolate birthday party cake obsession
CNs longer than three can be handled in a
recursive manner, so studying only the length of
three case is sufficient for analysis.
For CNs of length three, it is common to refer to
the bracketing as either left-branching (a b
c) or right-branching (a b c).

12
Meaning of Bracketing

Modifier nouns (those not rightmost) modify a
noun to its right in some way. Bracketing
defines which noun it modifies.
In animal cruelty committee, animal is
describing cruelty, not committee. animal
cruelty combined modify committee.
In woman customs official, both woman and
customs modify official, and not each other.
Can sometimes be ambiguous, or depend on context.

13
Adjacency vs. Dependency

Adjacency Method of bracketing was the previous
common method. Given the CN a b c , compare the
suitability of a b and of b c, and choose
which is more acceptable.
Dependency Method is Lauers creation, where he
notes that b always modifies c, so the better
question to ask is whether a b is more
acceptable than a c, since with his meaning
distribution theory, that is whats important.

14
Acceptability

How do we compare the acceptability or
suitability of two or more given noun pairs?
Compare the probabilities of a each noun pair,
scaled by the overall model probability.

15
Dependency Modeling

The mappings of the modifier dependencies can be
views as a tree structure, which describes the
meaning of the CN.
A given tree model can generate several different
strings, all equally probable.
The task then becomes to measure the probability
of having a certain model given an observed
string of nouns.

16
Model Probability
17
Model Probability (continued)

For the three word CNs, we get

When comparing, this simplifies to
Or, choose left bracketing unless b c is more
than twice as probable as a b.
18
Class-Based Smoothing

To overcome the problem of data requirements,
Lauer proposes smoothing each word to a semantic
class, and then computing probabilities based on
those classes instead. This smoothing makes the
preceding equations much less elegant, but in
predictable ways.
Each word is assumed to belong to at least one
semantic class, and all classes to which it
belongs are considered equiprobable.
Lauer makes use of Rogets Thesaurus to define
semantic classes of nouns.

19
Determining Noun Pair Probabilities

Scan a suitably large corpus (The New Groliers
Multimedia Encyclopedia) for occurrences of each
noun pair appearing as a lone CN. Assign
probabilities accordingly.
Introduces the lasting Lauer Heuristic for
identifying CNs in text, which is to look for
sequences of words to be known nouns, surrounded
by words not know to always be nouns.

20
Experiments

Experiments were conducted varying several
different features
Dependency vs. Adjacency Models
Adjacency vs. windowed identification of CNs
Symmetry vs. Asymmetry of counts
Effects of model probability scaling
Class smoothed counts vs. each word separately
POS tagged data vs. List of known nouns.

21
Summary of Results

Dependency model consistently outperforms the
adjacency model, and the baseline of guess left
Windowed counting only hurts accuracy.
Asymmetric counts are marginally better than
Symmetric ones.
Model Probability tuning significantly helps the
Adjacency model, but not the Dependency model.
Class based smoothing provides significant
improvements.
POS tagging can provide moderate improvements to
the estimation process.

22
Semantics
23
Semantics

Now that we have bracketed a given CN, we can
extract a set of length two CNs, which together
form the entire meaning of the CN, one for each
edge in the dependency graph.
But what does each two word CN mean?

24
Defining Semantics

Given the two words of the CN, what is the
relation between those two words expressed by the
CN?
No consensus about which list of relations to
use.
If such a detailed list was created, it would be
massively long, and likely incomplete.
Instead, limit analysis to a few common classes.
Lauer chose to use prepositional paraphrasing.

25
Prepositional Paraphrasing

Only applies to Non-Verbal-Nexus, Non-Copulative
CNs.
Interpret a b as b ltprepgt a, where ltprepgt is
one of of, for, in, at, on, from, with, about.
state laws ? laws of the state
baby chair? chair for babies
reactor waste ? waste from a reactor

26
Prepositional Paraphrasing

Pros
Concrete, small, list of classes
Easily identified in corpus texts
Commonly used
Cons
Does not always apply. 50 jacket, cold virus
Very shallow representation of semantics
Certain nouns present various lexical preferences
for various prepositions, which can skew
empirical results
Some relations can be expressed by multiple
prepositions

27
Predicting Paraphrases

When predicting which preposition to use in the
paraphrase, it is a simple case of choosing the
most probable.

After some assumptions regarding independence and
uniformity, and applying Bayes Theorem, this
simplifies to
28
Estimating Pobj and Phead

Use a POS tagged corpus (or use an automatic POS
tagger. Consider NN, NNS, NNP, NNPS, VBG to all
be nouns.
For Phead(np), look for n tagged as a noun,
followed by p.
For Pobj(np), look for p, followed by up to
three of JJ, DT, CD, PRP, POS followed by n.
The words between p and n are assumed to modify
n, and thus p and n are still associated.

29
Experiments

Compared to the Parsing experiments, there were
relatively few experiments performed
Word only classes, vs. Rogets Thesaurus
classes.
MLE vs. ELE estimates of probabilities.
Restriction of predictions to only a select few
prepositions.

30
Results

Overall, the results are abysmal, only barely
reaching significance above the baseline of
always guessing of (the most common relation).
Word based counts tend to perform marginally
better than class smoothed counts.
ELE slightly improves class based estimation over
MLE, but significantly hurts word only
estimation.
Restricting guesses only the most common results
can significantly increase accuracy, but at the
cost of never guessing the less frequent
relations.

31
Recent Efforts
32
Recent Efforts

Keller and Lapata have analyzed the effect of
using Internet search engine result counts for
estimating probabilities. Their results are
comparable.
Lapata has also attempted to resolve nominalized
CNs through corpus statistics, with significant
results.
Moldovan, et al. applied several learning
algorithms to the task of semantic labeling, with
a more detailed list of relations, and achieved
significant results.