Title: NLP 2?? ??
1NLP 2?? ??
- Linguistic Essentials
- (Ch 3)
2Competence and Performance
- Innate ? Learning, Categorical ? Statistical
- CFG (Context free grammar)
- Performance
3The Description of Language
- Grammar
- set of rules which describe what is allowable in
a language - Classic Grammars (Quirk et al.)
- meant for humans who know the language
- definitions and rules are mainly supported by
examples - no (or almost no) formal description tools
cannot be programmed - Explicit Grammar (CFG, LFG, GPSG, HPSG,
Dependency Grammars, Link Grammars,...) - formal description
- can be programmed tested on data (texts)
4Levels of (Formal) Description
- 6 basic levels (more or less explicitly present
in most theories) - and beyond (pragmatics/logic/...)
- meaning (semantics)
- (surface) syntax
- morphology
- phonology
- phonetics/orthography
- Each level has an input and output representation
- output from one level is the input to the next
(upper) level - sometimes levels might be skipped (merged) or
split
5Phonetics/Orthography
- Input
- acoustic signal (phonetics) / text (orthography)
- Output
- phonetic alphabet (phonetics) / text
(orthography) - Deals with
- Phonetics
- consonant vowel ( others) formation in the
vocal tract - classification of consonants, vowels, ... in
relation to frequencies, shape position of the
tongue and various muscles in the vocal t. - intonation
- Orthography normalization, punctuation, etc.
6Phonology
- Input
- sequence of phones/sounds (in a phonetic
alphabet) or normalized text (sequence of
(surface) letters in one languages alphabet) NB
nota bene (note well) phones vs. phonemes - Output
- sequence of phonemes ( (lexical) letters in an
abstract alphabet) - Deals with
- relation between sounds and phonemes (units which
might have some function on the upper level) - e.g. u oo (as in book), æ a (cat) i y
(flies)
7Morphology
- Input
- sequence of phonemes ( (lexical) letters)
- Output
- sequence of pairs (lemma, (morphological) tag)
- Deals with
- composition of phonemes into word forms and their
underlying lemmas (lexical units) morphological
categories (inflection, derivation, compounding) - e.g. quotations quote/V -ation(der.V-gtN)
NNS.
8(Surface) Syntax
- Input
- sequence of pairs (lemma, (morphological) tag)
- Output
- sentence structure (tree) with annotated nodes
(all lemmas, (morphosyntactic) tags, functions),
of various forms - Deals with
- the relation between lemmas morph. categories
and the sentence structure - uses syntactic categories such as Subject, Verb,
Object,... - e.g. I/PP1 see/VB a/DT dog/NN
- ((I/sg)SB ((see/pres)V
(a/ind dog/sg)OBJ)VP)S
9Meaning (semantics)
- Input
- sentence structure (tree) with annotated nodes
(lemmas, (morphosyntactic) tags, surface
functions) - Output
- sentence structure (tree) with annotated nodes
(autosemantic -has meaning in isolation - lemmas,
(morphosyntactic) tags, deep functions) - Deals with
- relation between categories such as Subject,
Object and (deep) categories such as Agent,
Effect adds other cats - e.g. ((I)SB ((was seen)V (by Tom)OBJ)VP)S
- (I/Sg/Pat/t
(see/Perf/Pred/t) Tom/Sg/Ag/f)
10...and Beyond
- Input
- sentence structure (tree) annotated nodes
(autosemantic lemmas, (morphosyntactic) tags,
deep functions) - Output
- logical form, which can be evaluated (true/false)
- Deals with
- assignment of objects from the real world to the
nodes of the sentence structure - e.g. (I/Sg/Pat/t (see/Perf/Pred/t) Tom/Sg/Ag/f)
- see(Mark-TwainSSN...,Tom-SawyerSSN...)Ti
mebef 99/9/27/1415Place39s1940N76s3710W
11Phonology
- (Surface lt-gt Lexical) Correspondence
- symbol-based (no complex structures)
- En. (stem-final change)
- lexical b a b y s ( denotes start of
ending) - surface b a b i e s (phonetic-related bébì0s)
- Arabic (interfixing, inside-stem doubling) (lit.
read) - lexical kTbuuCVCCVC (CVCC...vowel/consonant
pattern) - surface kuttub
12Phonology Examples
- German (umlaut) (satz sentence)
- lexical s A t z e (A denotes umlautable a)
- surface s ä t z e (phonetic zæc?, vs. zac)
- Turkish (vowel harmony)
- lexical e v l A r (?houses) b a š l
A r - surface e v l e r (heads?) b a š l
a r - Czech (e-insertion palatalization)
- lexical m a t E K 0 (lt-mothers/gen.) m a t E
K e - surface m a t e k (mother/dat. ?) m a t
c e
13Parts of Speech and Morphology
- Parts of Speech correspond to syntactic or
grammatical categories such as noun, verb,
adjective, adverb, pronoun, determiner,
conjunction, and preposition. - Word categories are systematically related by
morphological processes such as the formation of
plural form from the singular form. - The major types of morphological processes are
inflection, derivation and compounding.
14Parts of Speech
- Correspond to syntactic or grammatical categories
such as noun, verb, adjectives, prepositions. - Word categories are systematically related by
morphological processes such as the formation of
plural form from the singular form, past tense
from present tense.
15The Parts of Speech
- Noun Refer to entities like people, places,
things or idea. - Pronoun words that take the place of nouns.
- Proper noun names.
- Determiner describes the particular action in a
noun. - Adjective describes the properties of nouns or
pronouns. - Verb action in a sentence.
- Adverb describes a verb, an adjective or
another adverb. - And many more
16POS Labeling
- Children (NOUN) eat (VERB) sweet(ADJECTIVE)
candy(NOUN) - The(ARTICLE) children(NOUN) ate(VERB)
the(ARTICLE) cake(NOUN) - The(ARTICLE) news(NOUN) has(AUXILIARY)
been(MAIN VERB) quite(ADVERB)
sad(ADJECTIVE) in(PREPOSITION) fact(NOUN)
.(PERIOD)
17Morphology Morphemes Order
- Handles what is an isolated form in written text
- Grouping of phonemes into morphemes
- sequence deliverables ? deliver, able and s (3
units) - could as well be some ID numbers
- e.g. deliver 23987, s 12, able 3456
- Morpheme Combination
- certain combinations/sequencing possible, other
not - deliverables, but not ablederives nouns,
but not nouning - typically fixed (in any given language)
18Morphology From Morphemes to Lemmas Categories
- Lemma lexical unit, pointer to lexicon
- might as well be a number, but typically is
represented as the base form, or dictionary
headword - possibly indexed when ambiguous/polysemous
- state1 (verb), state2 (state-of-the-art), state3
(government) - from one or more morphemes (root, stem,
rootderivation, ...) (derivation vs.
inflection) - Categories non-lexical
- small number of possible values (lt 100, often lt
5-10)
19Morphology Level The Mapping
- Formally A ? 2(L,C1,C2,...,Cn)
- A is the alphabet of phonemes (A denotes any
non-empty sequence of phonemes) - L is the set of possible lemmas, uniquely
identified - Ci are morphological categories, such as
- grammatical number, gender, case
- person, tense, negation, degree of comparison,
voice, aspect, ... - tone, politeness, ...
- part of speech (not quite morphological category,
but...) - 2(L,C1,C2,...,Cn) denotes the power set of
(L,C1,C2,...,Cn) - A, L and Ci are obviously language-dependent
20The Dictionary (or Lexicon)
- Repository of information about words
- Morphological
- description of morphological behavior
inflection patterns/classes - Syntactic
- Part of Speech
- relations to other words
- subcategorization (or surface valency frames)
- Semantic
- semantic features
- valency frames
- ...and any other! (e.g., translation)
21The Categories Part of Speech Open and Closed
Categories
- Part of Speech - POS (pretty much stable set
across languages) - not so much morphological (can be looked up in a
dictionary), but - morphological behavior is typically consistent
within a POS category - Open categories (open to additions)
- verb, noun, pronoun, adjective, numeral, adverb
- subject to inflection (in general) subject to
cross-category derivations - newly coined words always belong to open POS
categories - potentially unlimited number of words
- Closed categories
- preposition, conjunction, article, interjection,
clitic, particle - not a base for derivation (possibly only by
compounding) - finite and (very) small number of words
22The Categories Part of Speech,Open Categories
Verbs
- Verbs
- infl. categories person, number, tense, voice,
aspect, gender, neg., ... - syntactic/semantic classification
- ordinary (to) speak, (to) write
- auxiliaries be, have, will, would, do, go
(going) - modals can, could, may, should, must, want
- phasal begin, end, start
- morphological classification
- conjugation type regular/irregular, (Ge.
weak/strong/irregular) - conjugation class (Cz. 5 classes 100
combinations)
23The Categories Part of Speech,Open Categories
Nouns
- Nouns infl. categories number, gender, case,
negation, ... - semantic classification
- human/animal/(non-living) things
driver/bird/stone - concrete/abstract computer/thought
- common/proper table/Hopkins
- syntactic classification countable/unc. book,
water - morphological classification
- pluralia/singularia tantum data (is), police
(are) - declension type (pattern or class) (Cz. 14
basic patterns, plus deviations 300 patterns,
irregular inflection) - adverbial nouns afternoon, home, east (no
inflection)
24The Categories Part of Speech,Open Categories
Pronouns
- Pronouns infl. categories number, gender, case,
negation person - much like nouns (syntactic usage also similar)
- (pro)noun stands for a noun
- classification (mostly syntactic/semantic)
- personal I, you, she, she, it, we, you, they
- demonstrative this, that
- possessive my, your, her, his, its, our, their
mine, yours, ours,... - reflexive myself, yourself, herself,..., oneself
- interrogative what, which, who, whom, whose,
that - indefinite (nominal) somebody, something, one
- morphological classification mostly
idiosyncratic pattern
25The Categories Part of Speech,Open Categories
Adjectives
- Adjectives
- infl. categories degree of comp., number,
gender, case, negation - classification
- ordinary new, interesting, test (equipment)
- possessive Johns, drivers
- proper Appalachian (Mountains)
- often derived from verbs/nouns teaching
(assistant), trendy, stylish - morphological classification
- mostly regular declension (Cz. 4 basic patterns,
10 total) - degrees of comparison (En. big, bigger, biggest)
- but large number of forms (agreement, cf.
section on syntax)
26The Categories Part of Speech,Open Categories
Adverbs
- Adverbs infl. categories degree of comp.,
negation - open cat. regular derivation from adjectives
common - new ? newly, interesting ? interestingly
- non-derived adverbs
- ordinary so, well, just, too, then, often, there
- wh-adverbs (interrogative) why, when, where, how
- degree adverbs/qualifiers very, too
- morphological classification (not much,
really...) - degree of comparison well, better, best
- soon, sooner (other lang. all 3 degrees regular)
27The Categories Part of Speech,Open Categories
Numerals
- Numerals infl. categories number, gender, case,
negation - open cat. compounding (Ge. einundzwanzig, 21)
- classification
- cardinals one, five, hundred
- NB million etc. often considered noun
- ordinals/fractionals first, second, thirtieth
- quantifiers all, many, some, none
- multiplicative times, twice (Cz.
dvaadvacetkrát, 22-times) - multilateral single, triple, twofold
- morphological classification as
nouns/adjectives many irreg.
28The Categories Part of Speech, Closed Categories
- Closed categories preposition, conjunction,
article, interjection, clitic, particle - Morphological behavior indeclinable (no
declension, no conjugation) - preposition of, without, by, to
- conjunction
- coordinating and, but, or, however
- subordinating that, if, because,
before, after, although, as - article a, the
- interjection wow, eh, hello
- clitic s may be attached to whole phrases (at
the end) - particle yes, no, not to (verb)
- many (otherwise) prepositions if part of phrasal
verbs, e.g. (look) up
29The Categories Number and Gender
- Grammatical Number Singular, Plural
- nouns, pronouns, verbs, adjectives, numerals
- computer / computers (he) goes / (they) go
- In some languages (Czech) Dual (nouns, pronouns,
adjectives) - (Pl.) nohami / (Dl.) nohama (Cz. (by) legs (of
sth)/(by) legs (of sb)) - Grammatical Gender Masculine, Feminine, Neuter
- nouns, pronouns, verbs, adjectives, numerals
- he/she/it ?????, ??????, ?????? (Ru.
(he/she/it) was-reading) - nouns (mostly) do not change gender for a single
lexical unit - Also animate/inanimate (gram., some genders),
etc. - Mädchen (Ge. girl, neuter) deti (Cz. children,
masc. inanim.)
30The Categories Case
- Case
- English only personal pronouns/possessives, 2
forms - other languages 4 (German), 6 (Russian), 7
(Czech,Slovak,...) - nouns, pronouns, adjectives, numerals
- most common cases (forms in singular/plural)
- nominative I/we (work)
tøída/tøídy (Cz. class) - genitive (picture of) me/us tøídy/tøíd
- dative (give to) me/us
tøíde/tøídám - accusative (see) me/us
tøídu/tøídy - vocative -/- tøído/tøídy
- locative (about) me/us tøíde/tøídách
- instrumental (by) me/us
tøídou/tøídami
31The Categories Person, Tense
- Person
- verbs, personal pronouns
- 1st, 2nd, 3rd (I) go, (you) go, (he) goes (we)
go, (you) go, (they) go - jdu, jdeš, jde,
jdeme, jdete, jdou (Cz.) - Tense (Cz. go) (Pol. go)
- past (you) went -
szliœcie - present (you pl.) go jdete
idziecie - future (!if not analytical) -
pùjdete - - concurrent (gerund) going jda
idac - preceding -
- sze³szy
32The Categories Person, Tense
- Person
- verbs, personal pronouns
- 1st, 2nd, 3rd (I) go, (you) go, (he) goes (we)
go, (you) go, (they) go - jdu, jdeš, jde,
jdeme, jdete, jdou (Cz.) - Tense (Cz. go) (Pol. go)
- past (you) went -
szliœcie - present (you pl.) go jdete
idziecie - future (!if not analytical) -
pùjdete - - concurrent (gerund) going jda
id¹c - preceding -
- szed³szy
33Note on Tense
- Grammars more (syntactic/sematnic) tenses
- but morphology handles isolated words ? some
tenses can be defined handled only at an upper
level (surface syntax) - Examples of (traditional) tense (synthetical and
analytical) - infinitive (to) write (tenseless, personless,
..., except negation (Cz.)) - simple present/past (I) write/(she) writes
(I,she) wrote - progressive present/past (I) am writing (I) was
writing - perfect present/past (I) have written (I) had
written - all in passive voice (cf. later), too
- (the book) is being/has been/had been written
etc. - all in conditional mood, too (mood in Eng. not a
morph. category!) - (the book) would have been written
34The Categories Voice Aspect
- Voice
- active vs. passive
- (I) drive / (I am being) driven
- (Ich) setzte (mich) / (Ich bin) gesetzt (Ge. to
sit down) - Aspect
- imperfective vs. perfective
- ?o????? / ????? (Ru. I used to buy, I was
buying) / I (have) bought) - imperfective continuous vs. iterative (repeating)
- spal / spával (Cz. I was sleeping / I used to
sleep (every ...))
35The Categories Negation, Degree of Comparison
- Negation
- even in English impossible ( not possible)
- Cz every verb, adjective, adverb, some nouns
prefix ne- - Degree of Comparison (non-analytical)
- adjectives, adverbs
- positive (big), comparative (bigger), superlative
(biggest) - Pol. (new) nowy, nowszy, najnowszy
- Combination (by prefixing)
- order? both possible (neg. Cz./Pol. ne-/nie-,
sup. nej-/naj-) - Cz. nejnemo?nìjší (the most impossible)
- Pol. nienajwierniejszy (the most unfaithful)
36Typology of Languages
- By morphological features
- Analytical using (function) words to express
categories - English, also French, Italian, ..., Japanese,
Chinese - I would have been going (Pol.) szlabym
- Inflective using prefix/suffix/infix, combines
several categ. - Slavic Czech, Russian, Polish,... (not
Bulgarian) also French, German Arabic - (Cz. new(acc.)) novou (Adj, Fem., Sg., Acc.,
Non-neg., Pos.) - Agglutinative one category per (non-lexical)
morpheme - Finnish, Turkish, Hungarian
- (Fin. plural) -i-
37Categories Tags
- Tagset
- list of all possible combinations of category
values for a given language - T Ì C1?C2?... ?Cn
- typically string of letters digits
- compact system short idiosyncratic
abbreviations - NNS (gen. noun, plural)
- positional system each position i corresponds to
Ci - AAMP3----2A---- (gen. Adj., Masc., Pl., 3rd case
(dative), comparative (2nd degree of comparison),
Affirmative (no negation)) - tense, person, variant, etc. N/A (marked by
empty position, or -) - Famous tagsets Brown, Penn, Multext-East, ...
38Words Syntactic Functions
- Typically, nouns refer to entities in the world
like people, animals and things. - Determiners describe the particular reference of
a noun and adjectives describe the properties of
nouns. - Verbs are used to describe actions, activities
and states. - Adverbs modify a verb in the same way as
adjectives modify nouns. Prepositions are
typically small words that express spatial or
time relationships. Prepositions can also be used
as particles to create phrasal verbs.
Conjunctions and complementizers link two words,
phrases or clauses.
39Syntax or Phrase Structure A simplecontext-free
grammar
- S --gt NP VP
- NP --gt AT NNS AT NN NP PP
- VP --gt VP PP VBD VBD NP
- P --gt IN NP
- AT --gt the
- NNS --gt children students mountains
- VBD --gt slept ate saw
- IN --gt in of
- NN --gt cake
The Grammar
The Lexicon
40Syntax or Phrase Structure A Parse Tree
41A Simple Context-Free Grammar
- The Grammar rules
- S -gt NP V
- NP -gt N
- The Lexicon
- N -gt John, Gaurav, Ram
- V -gt walks, talks, eats, went ..
42Tag Sets
- A tag indicates the various conventional parts of
speech. - Different Tag Sets have been used E.g., Brown
Tag Set, Penn Treebank Tag Set. - Tag examples NP Proper noun, NN Singular noun,
AT Article, DET Determinant.
43Stochastic Grammars
- Grammars obtained by adding probabilities in a
fairly transparent way to algebraic (i. e.,
non-probabilistic) grammars. - Stochastic grammars supplement underlying
algebraic grammars.
44Dependencies
- Local Dependency dependence between two words
expressed within the same syntactic rule.
(n-grams model this well) - Non-local dependency is an instance in which two
words can be syntactically dependent even though
they occur far apart in a sentence.
45Ambiguities
- Children eat sweet candy
- Too much boiling will candy the molasses
- In sentence (1) candy is a noun while in (2) it
is an adjective. - Word category (POS) ambiguity needs to be
resolved.
46Ambiguities (Cont.)
- Semantic Roles Determining thematic roles in a
sentence. - Agent, Patient, Experiencer, Instrument, Goal .
- Raju(AGENT) hit us (PATIENT) with a ball
(INSTRUMENT). - Complicated by the notions of direct and indirect
object, active and passive voice.
47Ambiguities (Cont.)
- Attachment ambiguities occur with phrases that
could have been generated by two different nodes
in the parse tree. E.g. saw the man in the house
with a pole. - Rare Usage and spurious usage A hectare is a
hundred ares.
48Garden-Path Sentences
- Garden-Path sentences are sentences that lead you
along a path that suddenly turns out not to work.
E.g. The horse raced past the barn fell.
49Local and Non-Local Dependencies
- A local dependency is a dependency between two
words expressed within the same syntactic rule. - A non-local dependency is an instance in which
two words can be syntactically dependent even
though they occur far apart in a sentence (e.g.,
subject-verb agreement long-distance
dependencies such as wh-extraction). - Non-local phenomena are a challenge for certain
statistical NLP approaches (e.g., n-grams) that
model local dependencies.
50The Place of Syntax
- Between Morphology and Meaning
- Morphology provides/expects
- lemmas (now its time to extract syntactic
information from a dictionary) - tags (Part-of-Speech and combination of
morphological categories, such as number, case,
tense, voice, ...) - and of course, we also have word order now to
look at/provide - Typically multiple input (non-disambiguated
morphology) / output (multiple syntactic
structures, non-disambiguated)
51Words, Phrases, Clauses, Sentences
- Words
- smallest units on the syntax level
- function/autosemantic
- Phrases
- consist of words and/or phrases constituents
- Clauses
- have predicative meaning (single predicate)
- Sentences
- consist of clauses (one or more)
52Words
- Words
- lexical units
- auxiliary (function) words have grammatical
function - autosemantic words (lexical words)
- idioms
- fixed phrases (non-compositional) -gt words
- Relate to other words
- dictionary repository of information for each
words about its (idiosyncratic) relations to
other words
53Phrases
- Phrases
- sequences of words and/or phrases (i.e. of
constituents) - may be discontinuous, sometimes
- Types of Phrases
- Simple/Clausal (i.e. clauses, which consist of
phrases, behave like phrases... recursively!) - According to head type
- Noun a new book
- Adjective brand new
- Adverbial so much
- Prepositional in a class
- Verb catch a ball
54Noun Phrases
- Head noun
- water
- a book
- new ideas
- that small village
- The greatest rise of interest rates since W.W.II
within a single year - an operating system which, despite great efforts
on the part of our administrators, fails all too
often
55Adjective Phrases
- Head adjective
- Simple APs very common, complex APs rare
- old
- very old
- really very old
- five times older than the oldest elephant in our
ZOO - (was) sure, as far as I know, to be there first
56Adverbial and Numerical Phrases
- Head adverb
- three times as much
- quickly
- really
- (... speaks) more loudly than anybody could
imagine - yesterday
- Numerical Phrases
- (... lasted) three hours
- twenty-two
57Prepositional Phrases
- Head preposition
- In fact, play the role of Adverbial Phrases often
- in the City
- at five oclock
- to a brightest future
- without a glitch
- to the point where neither of them could get out
of it - up to five points
- instead of Charles
58Verb Phrases
- Head verb
- (It) rains
- ... could ever see a large Unidentified Flying
Object - ..., why (we) have got so much rain
- Please!
- On Sunday, (he) was driven to the hospital
- (It) began to snow
- (...) prohibits smoking in this area
59Coordination of Phrases
- Head conjunction, punctuation
- and, or, but
- cats and dogs
- new or even newer
- quickly and precisely
- he came to the conclusion that it makes no sense
to hide himself anymore and therefore we could
hear him today - (trains) from and to Baltimore
- eat your lunch now or at the picnic table
60Ellipsis
- Word or Phrase missing where one would normally
expect one often happens in dialogues - Whom did you see there?
- Peter. ?? verb ??
- Most common in coordination (written text)
- Pittsburgh leads 4-0 but Detroit only 3-1.
??verb in 2nd part?? - Systematic in many languages pro-drop (leave out
a pers. pronoun in the Subject position) - She Passed the exam easily.
61Clauses
- Predicative function
- some activity of some subjects/objects, somewhere
in time, under certain circumstances - Main clause
- not part of a greater clause
- Embedded clause
- part of other clause, having some function (like
a phrase) - Function of a Clause
- same as for phrase, plus some (direct
speech/discourse etc.)
62Gaps (Non-Continuous Constituents)
- Constituent moves from the expected position
- happens in questions and relative clauses
- Who(m) do you work for ltgapgtwhom?
- strictly speaking, do you work should be you (do
work) - I dont know why we have got so much rain
ltgapgtwhy? - On Sundays, I usually work ltgapgtOn Sundays but I
stay home on Tuesdays. - The story he never wrote ltgapgtthe story
- And finally the car she was supposed to use
ltgapgtthe car for her trip to New York broke. - The last two also could be considered ellipsis
(which) plus a gap.
63Sentences
- Consist of a single or several main clauses
- If several main clauses
- coordination, much like coordinated phrases
- more coordinating conjunctions
- and, or, but, (and) therefore, ...
- In written text, starts with a capital letter
- Ends by period/question mark/exclamation mark
- not all periods end a sentence!
- Sometimes even semicolon () might be a sentence
break (...vague)
64Syntax Representation
- Tree structure (tree in the sense of graph
theory) - one tree per sentence
- Two main ideas for the shape of the tree
- phrase structure ( derivation tree, cf. parsing
later) - using bracketed grouping
- brackets annotated by phrase type
- heads (often) explicitly marked
- dependency structure (lexical relations local,
functions) - basic relation head (governor) - dependent
- links (edges) annotated by syntactic function
(Sb, Obj, ...) - phrase structure implicitly present (but 1n
mapping Dep?PS)
65Phrase Structure Tree
66Dependency Tree
- Example
- rosePred(sharesSb(DaimlerChryslersAtr),eightsAdv(
threeAtr),toAuxP(22Adv))
67Semantic Roles
- Most commonly, noun phrases are arguments of
verbs. These arguments have semantic roles the
agent of an action, the patient and other roles
such as the instrument or the goal. - In English, these semantic roles correspond to
the notions of subject and object. - But things are complicated by the notions of
direct and indirect object, active and passive
voice.
68Subcategorization
- Different verbs can relate different numbers of
entities transitive versus intransitive verbs. - Tightly related verb arguments are called
complements but less tightly related ones are
called adjuncts. Prototypical examples of
adjuncts tell us time, place, or manner of the
action or state described by the verb. - Verbs are classified according to the type of
complements they permit. This called
subcategorization. Subcategorizations allow to
capture syntactic as well as semantic
regularities.
69Attachment Ambiguity and Garden-Path Sentences
- Attachment ambiguities occur with phrases that
could have been generated by two different nodes
in the parse tree.The child ate the cake with a
spoon. - Genuinely ambiguous Fruit flies like a banana.
- Garden-Path sentences are sentences that lead
along a path that suddenly turns out not to
work.The horse raced past the barn fell.
70Semantics
- Semantics is the study of the meaning of words,
constructions, and utterances. - Semantics can be divided into two parts lexical
semantics and combination semantics. - Lexical semantics hypernymy, hyponymy, antonymy,
meronymy, holonymy, synonymy, homonymy, polysemy,
and homophony. - Compositionality the meaning of the whole often
differs from the meaning of the parts. - Idioms correspond to cases where the compound
phrase means something completely different from
its parts.
71Pragmatics
- Pragmatics is the area of studies that goes
beyond the study of the meaning of a sentence and
tries to explain what the speaker really is
expressing. - Understand the scope of quantifiers, speech acts,
discourse analysis, anaphoric relations. - The resolution of anaphoric relations is crucial
to the task of information extraction.