CS598 DNR FALL 2005 Machine Learning in Natural Language - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

CS598 DNR FALL 2005 Machine Learning in Natural Language

Description:

Sometimes, we need to know the meaning of the sentence to decide if a word is a ... Mary gave John flowers. Reflexive Verbs NP (subject) Reflexive Pronoun(object) ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 37
Provided by: DanMol1
Category:

less

Transcript and Presenter's Notes

Title: CS598 DNR FALL 2005 Machine Learning in Natural Language


1
CS598 DNR FALL 2005Machine Learning in
Natural Language
  • Introduction Part 3
  • Linguistics Essentials
  • (The role of Linguistics in NLP)

2
Introduction
  • This is not a class in NLP but we want to
    discuss how to make progress in natural language
    understanding
  • Introduce basic linguistics concepts.
  • Basic terminology
  • Discuss the levels of analysis used in NLP
  • Problems associated with each level.

3
Comprehension
  • (ENGLAND, June, 1989) - Christopher Robin is
    alive and well. He lives in England. He is the
    same person that you read about in the book,
    Winnie the Pooh. As a boy, Chris lived in a
    pretty home called Cotchfield Farm. When Chris
    was three years old, his father wrote a poem
    about him. The poem was printed in a magazine
    for others to read. Mr. Robin then wrote a book.
    He made up a fairy tale land where Chris lived.
    His friends were animals. There was a bear
    called Winnie the Pooh. There was also an owl
    and a young pig, called a piglet. All the
    animals were stuffed toys that Chris owned. Mr.
    Robin made them come to life with his words. The
    places in the story were all near Cotchfield
    Farm. Winnie the Pooh was written in 1925.
    Children still love to read about Christopher
    Robin and his animal friends. Most people don't
    know he is a real person who is grown now. He
    has written two books of his own. They tell what
    it is like to be famous.
  • 1. Who is Christopher Robin? 2. When
    was Winnie the Pooh written?
  • 3. What did Mr. Robin do when Chris was three
    years old?
  • 4. Where did young Chris live? 5. Why did
    Chris write two books of his own?

Other motivating problems Entailment
Translation,Generation
4
Introduction
  • Discuss the levels of analysis used in NLP
  • Problems associated with each level.
  • For each level of Linguistics Analysis we will
    ask
  • What are the problems here?
  • What would we consider as a solution?

5
Levels of Analysis
  • In traditional linguistics people talk about
    several levels of analysis, or types of
    linguistics knowledge.
  • Morphology
  • How words are constructed
  • Syntax
  • Structural relation between words
  • Semantics
  • The meaning of words and of combinations fo words
  • Pragmatics.
  • How a sentence is used? Whats its purpose
  • Discourse (sometimes distinguished as a subfield
    of Pragmatics)
  • Relationships between sentences global context.

6
Morphology
  • Morphology How words are constructed prefixes
    Suffixes
  • The simple cases are
  • kick, kicks, kicked,
    kicking
  • But other cases may be
  • sit, sits, sat, sitting
  • Not just as simple as adding and deleting
    certain endings, as in
  • gorge, gorgeous
  • good, goods
  • arm, army
  • This might be very different in other
    languages...
  • (Problems solutions)

7
Syntax
  • Syntax Structural relationship between words.
  • The main issues here are structural ambiguities,
    as in
  • I saw the Grand Canyon
    flying to New York.
  • or
  • Time flies like an arrow.
  • The sentence can be interpreted as a
  • Metaphor time passes quickly, but also
  • Declaratively Insects have an affinity for
    arrows
  • Imperative measure the time of the insects.
  • Key issue Often syntax doesn't tell us much
    about meaning.
  • Plastic cat food can
    cover

8
Semantics
  • Semantics The meaning of words and of
    combinations of words.
  • Some key issue here
  • Lexical ambiguities
  • I walked to the bank of
    the river / to get money.
  • The bug in the room was
    probably planted by spies /

  • flew out the window.
  • Compositionality The meaning of
    phrases/sentences as a function of the meaning of
    words in them.
  • (Problems Solutions)

9
Pragmatics/Discourse
  • Pragmatics How a sentence is used its purpose.
  • E.g. Rules of conversation
  • Can you tell me what time it is
  • Could I have the salt
  • Discourse Relations between sentences global
    context.
  • An important example here is the problem of
    co-reference
  • When Chris was three years old,
    his father wrote
  • a poem about him.
  • Chicago?
  • (Running towards an agent in an airport Ticket
    Agency)

10
Morphology and Part-of-Speech
  • Words are related by morphological processes such
    as
  • forming plural forms from singular forms
    dog...dogs
  • adding prefixes and suffixes
    conceive ...inconceivable
  • Importance?
  • It makes language more predictable.
  • It allows us to handle new words which are
    outside our vocabulary.
  • Understanding morphology may support
    generalization to unknown words.
  • However, Morphology may be tricky.
  • Not always as simple as stripping common prefixes
    and suffixes.
  • preempt....... empt ?
  • gorgeous.... like a gorge?
  • apply........... like an apple?
  • old.............. oldly?
  • Mrs. .......... plural of Mr.
  • atomic......... not Tom-like

11
Morphological Processes
  • Inflectional forms
  • Words generated share the same basic meaning and
    part of speech.
  • Words are generated by systematic modifications
    of the root forms.
  • kick,kicks,kicked,
    kicking
  • Derivational forms
  • Words generated may have different meaning and
    part of speech.
  • friend...friendly
    wide...widely hard...hardly
  • Is there a problem to solve here? What would you
    consider a solution?

12
Part of Speech
Data / Demo
  • The part-of-speech of words in a sentence has an
    important role in all recent works in natural
    language Necessary to read the literature and
    the corpora.
  • Part of speech (POS) is a way to categorize words
    based on a particular syntactic (and often
    semantic) function they take in the sentence.
  • Sometimes called syntactic or grammatical
    categories.
  • Important POS
  • Nouns typically refer to people, animals and
    things.
  • Verbs express the action in the sentence.
  • Adjectives describe properties of nouns.
  • Children eat
    sweet candy
  • Children Noun - group of people.
  • eat Verb - describes what people do with candy.
  • sweet Adj.- a property of candy
  • candy Noun - a particular type of food
  • Other basic Parts of Speech adjective, adverb,
    article, pronoun, conjunction

13
Part of Speech (cont.)
  • Useful sub-categorization of POS into two types
  • Open class words
  • A constantly changing set new words are often
    introduced into the language.
  • nouns, verbs, adjectives and adverbs
  • Closed class words
  • A relatively stable set new words are rarely
    introduced into the language.
  • articles, pronouns, prepositions, conjunctions.
  • It is therefore easier to deal with closed class
    words.
  • Articles a, an, the
  • Pronouns I, you, me, we, he, she, him,
    her, it, them, they
  • Prepositions to, for, with, between, at,
    of
  • Demonstratives this, that, these, those
  • Quantifiers some, every, most, any, both
  • Conjunctions and, or, but

14
Closed class words (not so easy)
  • Articles pose a lot of difficulty for language
    generation.
  • Most noun phrases start with an article
  • a newspaper, an apple, the movie
  • But, there are many exceptions,
  • The bowl was full of rice. The bowl
    was full of apple.
  • I go to college. I go
    to university.
  • She went on vacation. She went
    on trip.
  • He fell asleep in class. He
    fell asleep in room.

15
Closed class words (not so easy-II)
  • Another closed class words that are hard to deal
    with prepositions particles.
  • Prepositions represent relations time,
    location, modification, complements.
  • He put the book on the table
  • He gave the book to Mary
  • He walked up the stairs
  • Particles are prepositions that follow verbs to
    create new verb forms.
  • He passed out
  • But also
  • He threw the cookies up the chimney vs.
    He threw up the cookies
  • And sometimes, it can be ambiguous
  • He looked over the paper.
  • Other problems with prepositions include
    attachments, which will be discussed later when
    we discuss semantics.
  • Problems? Solutions?

POS? Disambiguation? Text Correction?
16
Nouns
  • Nouns refer to entities in the world, which
    represent objects, places, concepts, people,
    events
  • dog, city, idea, marathon
  • Count nouns describe specific objects or sets
    of objects (above)
  • Mass nouns describe composites or substances,
  • dirt, water, garbage, deer.
  • Pronouns are special class of nouns that refer to
    a person or a thing'' that is salient in the
    context of use.
  • After Mary had arrived in the village, she looked
    for a hotel.
  • Relative Pronouns are pronouns like
  • who, which, that
  • The man who saw Elvis.. The UFO that landed
    in Toledo ...
  • The Rolling Stones concert, which I attended, ...

17
Nouns (cont.)
  • Nouns can be objects of verbs or subjects of
    verbs
  • Children eat sweet candy
  • Subject Object
  • Proper nouns are names like
  • Mary, Smith, United stated, IBM, Little Rock.
  • Nouns have Modifiers. They can be modified by
  • adjectives words that attribute qualities to
    objects.
  • wet, loud, happy, funny or by
  • noun modifiers
  • dog food, tin can, song book.
  • In this case we can talk about the head noun
    which represents the main concept, e.g., dog
    food.
  • A noun is usually embedded in a noun phrase.
  • A syntactic unit of the sentence in which
    information about the noun is gathered.
  • The noun is the head of the noun phrase.
  • In addition to the noun we may find in a noun
    phrase an article The tree, and an adjective
    The tall tree''.
  • Problems, Solutions?

Identification? Why do we need to solve it? How
to evaluate it?
18
Verbs
  • Verbs Words that represent actions, commands or
    assertions.
  • Main verbs walk, eat, believe, claim, ask
  • Auxiliary verbs be, do, have
  • Modal verbs will, can, could
  • Verbs can be
  • transitive they take a complement, as in
  • eat an apple read a
    book sing a song
  • intransitive verbs that do not take complements,
    as in
  • she laughed he
    slept I lied

19
Verbs (cont.)
  • Verbs have morphological forms
  • Base walk be
    go
  • Present walks is
    goes
  • Past walked was
    went
  • Present Participle walking being going
  • Past Participle walked been
    going

20
Verbs (cont.)
  • Verbs can be Active or Passive.
  • The passive voice form consists of a form of to
    be followed by the past participle.
  • Active
    Passive
  • I saw Elvis Elvis was
    seen by me.
  • I will find him. He will be
    found by me.
  • I have found him. He has been
    found by me.
  • The roles are reversed in actives and passives.
  • John killed Sam
    subject is killer, direct object is victim
  • Sam was killed by John subject
    is victim, object of by'' is killer
  • Some verbs take indirect objects, e.g.
  • I gave Mary the book vs. I gave
    the book to Mary.
  • Mary indirect object book direct
    object

21
Verbs (cont.)
  • Prepositions and Particles are important in the
    context of verbs.
  • When they appear as Particles they create new
    verb forms.
  • Sometimes, we need to know the meaning of the
    sentence to decide if a word is a preposition or
    a particle.
  • She ran up the hill She ran up the
    bill

22
Verb Phrases
  • The verb phrase is the syntactic unit that
    organizes all elements of the sentence that
    depend syntactically on the verb.
  • The Verb is the head of the verb phrase.
  • An Adverb is an element of the verb phrase which
    specify
  • place, time, manner, degree
  • She often travels to Las Vegas.
  • She allegedly committed perjury.
  • She started her career off impressively.

23
Verb Sub-categorization
  • This is a categorization of verbs according to
    the types of complements they take.
  • Complements of a verb are different syntactic
    means that verbs can exploit to express related
    entities.
  • The set of complements that a verb can appear
    with is called its subcategorization frame.
  • Examples Verbnet

24
Sub-categorization Frames
  • Intransitive NP(subject)
  • The woman walked
  • Transitive NP (subject) NP(object)
  • John loves Mary
  • Dbl obj Construction NP (subject) NP (direct
    object) NP (object)
  • Mary gave John flowers
  • Reflexive Verbs NP (subject)
    Reflexive Pronoun(object)
  • She introduced herself
  • NP (subject) NP (object) PP(location)
  • She put the book on the table
  • Clause complement NP (subject) NP (object)
    that clause
  • She told me that Gary is coming.
  • Complements of verbs can be either
  • Obligatory arguments (subject, object, direct
    object)
  • She put the book on the table
    or
  • Optional (like pp phrase or a subordinate clause
    (e.g., "that clause).
  • She gave her presentation on the stage.

25
Sub-categorization Frames
  • Intransitive NP(subject)
  • The woman walked
  • Transitive NP (subject) NP(object)
  • John loves Mary
  • Dbl obj Construction NP (subject) NP (direct
    object) NP (object)
  • Mary gave John flowers
  • Reflexive Verbs NP (subject)
    Reflexive Pronoun(object)
  • She introduced herself
  • NP (subject) NP (object) PP(location)
  • She put the book on the table
  • Clause complement NP (subject) NP (object)
    that clause
  • She told me that Gary is coming.
  • Complements of verbs can be either
  • Obligatory arguments (subject, object, direct
    object)
  • She put the book on the table
    or
  • Optional (like pp phrase or a subordinate clause
    (e.g., "that clause).
  • She gave her presentation on the stage.

26
Syntactic and Semantic Regularities
  • Subcategorization frames capture syntactic
    regularities.
  • There are also semantic regularities, usually
    called selectional restrictions or preferences.
  • E.g., "bark" prefers dogs as subjects
  • "eat" prefers edible things as
    objects.
  • Sentences that violate selectional preferences
    sound odd.
  • The cat barked all night.
  • I eat philosophy every day.
  • Last word about verbs
  • Gerunds are present particles that function as
    nouns.
  • sleeping bags drinking fountain moving sale

27
Syntax
  • Words is a sentence are not randomly strung
    together in a sequences.
  • Words are organized in phrases and arranged in
    particular word order.
  • Syntax is the study of regularities and laws of
    word order and phrase structure.
  • In English, we cannot determine the meaning of
    the sentence from the meaning of the words.
  • Mary gave Peter a book. Peter
    gave Mary a book.
  • The basic word order in English is
    Subject-Verb-Object
  • This holds for declarative sentences,
  • The children should eat spinach
  • but the order changes to express a particular
    "mood"
  • Interrogative (question) Should the children
    eat spinach? Try on demos
  • Imperative (command, request) Eat spinach!

28
Rewrite Rules
  • The regularities of word order are captured using
    rewrite rules.
  • The symbol on the left of the rule can be
    re-written as the set of symbols on the right.
  • S ? NP VP NP ? John, garbage VP ? laughed,
    smells
  • This set of rewrite rules can produce the
    following sentences
  • John laughed Garbage laughed John smelled
    Garbage smelled.
  • Symbols that cannot be decomposed are called
    terminal symbols.
  • Symbols that can be decomposed are called
    nonterminals.
  • An intuitive way to represent a sentence
    structure is as a tree, in which each nonterminal
    represents the application of the rewrite tree. T
  • he following example present a tree
    representation of the sentence
  • John walked the dog with fleas.

29
Rewrite Rules
  • The regularities of word order are captured using
    rewrite rules.
  • The symbol on the left of the rule can be
    re-written as the set of symbols on the right.
  • S ? NP VP NP ? John, garbage VP ? laughed,
    smells
  • This set of rewrite rules can produce the
    following sentences
  • John laughed Garbage laughed John smelled
    Garbage smelled.
  • Symbols that cannot be decomposed are called
    terminal symbols.
  • Symbols that can be decomposed are called
    nonterminals.
  • An intuitive way to represent a sentence
    structure is as a tree, in which each nonterminal
    represents the application of the rewrite tree. T
  • he following example present a tree
    representation of the sentence
  • John walked the dog with fleas.

30
Rewrite Rules
  • This is produced using a set of rewrite rules
    that we call the
  • Grammar A formal specification of the structures
    allowable in a language
  • .A grammar that can produce this tree is
  • S -- NP VP
  • NP -- Det NP
  • NP -- Det noun PP
  • NP -- ADJ NP
  • NP -- noun NP
  • NP -- noun PP
  • NP -- noun
  • VP -- V NP PP
  • VP -- V NP
  • VP -- V PP
  • VP -- V
  • PP -- Prep NP PP P
  • P -- Prep NP

S
NP
VP
NP
V
N
NP
Det
N
PP
NP
John walked the dog with the
fleas
But, the same grammar can also produce other
trees. E.g., the one that means that the fleas
helped John walk the dog. That is, the grammar
is not enough.
31
Parsing
  • A parsing technique is a method for determining
    the structure of a sentence with respect to
    (given) a grammar.
  • A parser is a computer program that determines
    the structure of the sentence. Not to confuse
    with a program that induces the grammar.
  • Lexical vs. non-lexical grammar many grammars
    today are lexicalized in that the re-write rules
    include specific words.
  • Notice that rewrite rules can be applied
    recursively. This is important, since it allows
    for simple nonterminals to expand to a large
    number of words.This allows for the generation
    for many long term dependencies, e.g., between
    subjects and verbs, and is a source of
    difficulties in NLP.
  • Shallow parse is a parse of the sentence at a
    shallow level only one or two levels above the
    non-terminals. This is considered an easier task
    that, quite often can be more robust.
  • There are multiple grammar formalisms. What we
    showed here is a constituent-based formalisms
    but there exist others.

32
Semantics
  • Semantics the study of the meaning of language.
    Can be decomposed into
  • Lexical semantics the study of meaning of
    individual words
  • Global semantics how the meaning of individual
    words are combined into meaning of sentences
    (or more).
  • One approach to lexical semantics is to study how
    word meanings are related to each other. To study
    this, words can be organized into lexical
    hierarchies
  • (as done in WordNet).

33
Lexical Semantics
  • Hypernym a word with a more general sense.
  • hypernym(cat) animal
  • Hyponnym a word with a more specific sense.
  • Antonym a word having opposite meaning.
  • antonym(hot)cold.
  • Meronym part-of.
  • meronym(tree)leaf.
  • Synonym same meaning
  • Homonyms words that are written the same way but
    represent different words.
  • Bank (river, finance) suit (law, set of
    garment)
  • Polysemy word with two senses that are related
  • Branch natural subdivision of a plant
  • separate but dependent part of an
    organization.

34
Lexical Semantics
  • When we move to global semantics, the natural
    problem is
  • How to use the meaning of single words to produce
    a meaning of a sentence?
  • This is a hard problem, since natural language
    does not obey the principle of compositionality.
  • E.g., the word white refers to different colors
    in the following expressions
  • white paper white hair white skin white
    wine
  • There are problems of idioms and the scope of
    words in the sentence that makes this even harder.

Mutli-word expressions
35
Pragmatics
  • One of the important issues studied here is that
    of discourse analysis.
  • A central problem there is that resolution of
    anaphoric relations.
  • An example
  • Mary helped the other passenger out of the cab.
    The man had asked her to help him because of his
    foot injury.
  • Anaphoric relations hold between Noun Phrases
    that refer to the same thing in the world.
  • In the above example, there are quite a few ways
    to resolve the identify of "the man","him" and
    "his foot".
  • This issue is important in many applications, in
    particular in information extraction -- where
    there is a need to keep track of participants.
  • The Reference problem vs. the Co-reference
    problem.

36
Summary
  • Linguistics is subdivided traditionally into
  • Phonetics (physical sounds of the language
    consonants, vowels, intonation)
  • Phonology (how sounds are mentally represented),
  • Morphology,
  • Syntax,
  • Semantics and
  • Pragmatics.
  • Most of the work within the statistics and
    learning-based approaches to natural language is
    done in the areas of Syntax, Semantics, and some
    Pragmatics and this will be our main concern in
    this course as well.
  • Phonetics is also studied using related methods,
    within the Speech community, and the techniques
    we will present in this course could be used
    there, as well as in Morphology and Discourse
    analysis.
Write a Comment
User Comments (0)
About PowerShow.com