Designing Statistical Language Learners: Experiments on Noun Compounds - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Designing Statistical Language Learners: Experiments on Noun Compounds

Description:

Material Covered in Chapters 1-4 and from other ... [animal cruelty] committee] [woman [customs official] ... animal cruelty' combined modify 'committee' ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 33
Provided by: roda8
Category:

less

Transcript and Presenter's Notes

Title: Designing Statistical Language Learners: Experiments on Noun Compounds


1
Designing Statistical Language LearnersExperimen
ts on Noun Compounds
by Mark Lauer 1995 PhD Thesis, Macquarie
University
  • presented by Rod Adams

2
Outline
  • Background Information
  • Part I Syntactic Analysis
  • Part II Semantic Analysis
  • Recent Efforts

3
Background Information
  • Material Covered in Chapters 1-4 and from other
    sources, essential for the understanding of the
    Chapter 5 Experiments.

4
What is a Compound Noun?
  • A compound noun is any consecutive sequence of
    nouns at least two words in length that functions
    as a noun
  • police officer, car park, bread knife,
    radio telescope platform coordination computer
    software test
  • Often represent a shortened form of a longer
    expression.
  • officer in the police department, place
    intended for the parking of cars, knife
    designed specifically to cut bread, a test for
    the software, which is run on a computer, which
    coordinates the platform that houses the
    telescope that makes use of radio waves.

5
Three Forms of CNs
  • Open Form
  • dining room, disaster supplies, fish tank
  • Hyphenated Form
  • sky-scraper, secretary-general, hanger-on
  • Closed Form
  • boyfriend, baseball, motorcycle
  • Typically, only the open form is studied in
    research, since the other two tend to be
    lexicalized.

6
Frequency
  • CNs are used very frequently, and are highly
    productive in modern language.
  • Roughly 1 CN in 5 sentences in modern fictional
    prose, with about an 86 chance of encountering a
    previously unseen CN type. This is commonly
    accepted as the low end of CN frequency and
    generation.
  • News articles typically have a new CN type in
    every second sentence.
  • The abstracts from 293 technical articles
    contained 3,514 distinct types, or 12 distinct
    types per abstract.

7
Frequency (continued)
  • Cannot be simply cataloged and analyzed.
  • Too many in existence to be practical
  • New ones get produced too regularly to keep
    maintained.
  • Must have automated tools for processing.
  • However, some are so common or their meaning so
    removed from the composite words that they have
    been added to the language lexicon, aka
    lexicalized.
  • soap opera, scum bag, post office

8
Types of CNs
  • Nominalizations The head, or rightmost word, is
    a nominalized form of a verb.
  • stamp collection, marathon running
  • a.k.a. Verbal-Nexus Compounds, Sentence
    Compounds, Sortal Nouns.
  • Non-Verbal-Nexus Those not formed with a
    nominalized head.
  • a.k.a. Deleted Predicate Nominals, Relational
    Nominals
  • Copulative A subclass of Non-Verbal-Nexus CNs
    where the modifier is a special type of the head
    word.
  • tuna fish, submarine ship

9
Meaning Distribution Theory
  • Chapter 3 focuses on Lauers defense for using a
    predominately semantic view of language, which
    was not popular at that time. Basic tenets
  • Communication is the expression of a combination
    of smaller primitive elements.
  • Analysis of text is governed by the expected
    meaning of a given text given all the possible
    meanings.
  • A full understanding is not required to
    understand the experiments, just some of the
    motivations.

10
Experiments, Part ISyntactic Parsing
11
Parsing
  • Given a compound, determine the correct parse
    tree, or equivalently, the bracketing of the
    composite nouns.
  • animal cruelty committee
  • woman customs official
  • chocolate birthday party cake obsession
  • CNs longer than three can be handled in a
    recursive manner, so studying only the length of
    three case is sufficient for analysis.
  • For CNs of length three, it is common to refer to
    the bracketing as either left-branching (a b
    c) or right-branching (a b c).

12
Meaning of Bracketing
  • Modifier nouns (those not rightmost) modify a
    noun to its right in some way. Bracketing
    defines which noun it modifies.
  • In animal cruelty committee, animal is
    describing cruelty, not committee. animal
    cruelty combined modify committee.
  • In woman customs official, both woman and
    customs modify official, and not each other.
  • Can sometimes be ambiguous, or depend on context.

13
Adjacency vs. Dependency
  • Adjacency Method of bracketing was the previous
    common method. Given the CN a b c , compare the
    suitability of a b and of b c, and choose
    which is more acceptable.
  • Dependency Method is Lauers creation, where he
    notes that b always modifies c, so the better
    question to ask is whether a b is more
    acceptable than a c, since with his meaning
    distribution theory, that is whats important.

14
Acceptability
  • How do we compare the acceptability or
    suitability of two or more given noun pairs?
  • Compare the probabilities of a each noun pair,
    scaled by the overall model probability.

15
Dependency Modeling
  • The mappings of the modifier dependencies can be
    views as a tree structure, which describes the
    meaning of the CN.
  • A given tree model can generate several different
    strings, all equally probable.
  • The task then becomes to measure the probability
    of having a certain model given an observed
    string of nouns.

16
Model Probability
17
Model Probability (continued)
  • For the three word CNs, we get

When comparing, this simplifies to
Or, choose left bracketing unless b c is more
than twice as probable as a b.
18
Class-Based Smoothing
  • To overcome the problem of data requirements,
    Lauer proposes smoothing each word to a semantic
    class, and then computing probabilities based on
    those classes instead. This smoothing makes the
    preceding equations much less elegant, but in
    predictable ways.
  • Each word is assumed to belong to at least one
    semantic class, and all classes to which it
    belongs are considered equiprobable.
  • Lauer makes use of Rogets Thesaurus to define
    semantic classes of nouns.

19
Determining Noun Pair Probabilities
  • Scan a suitably large corpus (The New Groliers
    Multimedia Encyclopedia) for occurrences of each
    noun pair appearing as a lone CN. Assign
    probabilities accordingly.
  • Introduces the lasting Lauer Heuristic for
    identifying CNs in text, which is to look for
    sequences of words to be known nouns, surrounded
    by words not know to always be nouns.

20
Experiments
  • Experiments were conducted varying several
    different features
  • Dependency vs. Adjacency Models
  • Adjacency vs. windowed identification of CNs
  • Symmetry vs. Asymmetry of counts
  • Effects of model probability scaling
  • Class smoothed counts vs. each word separately
  • POS tagged data vs. List of known nouns.

21
Summary of Results
  • Dependency model consistently outperforms the
    adjacency model, and the baseline of guess left
  • Windowed counting only hurts accuracy.
  • Asymmetric counts are marginally better than
    Symmetric ones.
  • Model Probability tuning significantly helps the
    Adjacency model, but not the Dependency model.
  • Class based smoothing provides significant
    improvements.
  • POS tagging can provide moderate improvements to
    the estimation process.

22
Semantics
23
Semantics
  • Now that we have bracketed a given CN, we can
    extract a set of length two CNs, which together
    form the entire meaning of the CN, one for each
    edge in the dependency graph.
  • But what does each two word CN mean?

24
Defining Semantics
  • Given the two words of the CN, what is the
    relation between those two words expressed by the
    CN?
  • No consensus about which list of relations to
    use.
  • If such a detailed list was created, it would be
    massively long, and likely incomplete.
  • Instead, limit analysis to a few common classes.
  • Lauer chose to use prepositional paraphrasing.

25
Prepositional Paraphrasing
  • Only applies to Non-Verbal-Nexus, Non-Copulative
    CNs.
  • Interpret a b as b ltprepgt a, where ltprepgt is
    one of of, for, in, at, on, from, with, about.
  • state laws ? laws of the state
  • baby chair? chair for babies
  • reactor waste ? waste from a reactor

26
Prepositional Paraphrasing
  • Pros
  • Concrete, small, list of classes
  • Easily identified in corpus texts
  • Commonly used
  • Cons
  • Does not always apply. 50 jacket, cold virus
  • Very shallow representation of semantics
  • Certain nouns present various lexical preferences
    for various prepositions, which can skew
    empirical results
  • Some relations can be expressed by multiple
    prepositions

27
Predicting Paraphrases
  • When predicting which preposition to use in the
    paraphrase, it is a simple case of choosing the
    most probable.

After some assumptions regarding independence and
uniformity, and applying Bayes Theorem, this
simplifies to
28
Estimating Pobj and Phead
  • Use a POS tagged corpus (or use an automatic POS
    tagger. Consider NN, NNS, NNP, NNPS, VBG to all
    be nouns.
  • For Phead(np), look for n tagged as a noun,
    followed by p.
  • For Pobj(np), look for p, followed by up to
    three of JJ, DT, CD, PRP, POS followed by n.
    The words between p and n are assumed to modify
    n, and thus p and n are still associated.

29
Experiments
  • Compared to the Parsing experiments, there were
    relatively few experiments performed
  • Word only classes, vs. Rogets Thesaurus
    classes.
  • MLE vs. ELE estimates of probabilities.
  • Restriction of predictions to only a select few
    prepositions.

30
Results
  • Overall, the results are abysmal, only barely
    reaching significance above the baseline of
    always guessing of (the most common relation).
  • Word based counts tend to perform marginally
    better than class smoothed counts.
  • ELE slightly improves class based estimation over
    MLE, but significantly hurts word only
    estimation.
  • Restricting guesses only the most common results
    can significantly increase accuracy, but at the
    cost of never guessing the less frequent
    relations.

31
Recent Efforts
32
Recent Efforts
  • Keller and Lapata have analyzed the effect of
    using Internet search engine result counts for
    estimating probabilities. Their results are
    comparable.
  • Lapata has also attempted to resolve nominalized
    CNs through corpus statistics, with significant
    results.
  • Moldovan, et al. applied several learning
    algorithms to the task of semantic labeling, with
    a more detailed list of relations, and achieved
    significant results.
Write a Comment
User Comments (0)
About PowerShow.com