Title: Natural Language Processing
1Natural Language Processing
- Artificial Intelligence
- CMSC 25000
- February 28, 2002
2Agenda
- Why NLP?
- Goals Applications
- Challenges Knowledge Ambiguity
- Key types of knowledge
- Morphology, Syntax, Semantics, Pragmatics,
Discourse - Handling Ambiguity
- Syntactic Ambiguity Probabilistic Parsing
- Semantic Ambiguity Word Sense Disambiguation
- Conclusions
3Why Language?
- Natural Language in Artificial Intelligence
- Language use as distinctive feature of human
intelligence - Infinite utterances
- Diverse languages with fundamental similarities
- Computational linguistics
- Communicative acts
- Inform, request,...
4Why Language? Applications
- Machine Translation
- Question-Answering
- Database queries to web search
- Spoken language systems
- Intelligent tutoring
5Knowledge of Language
- What does it mean to know a language?
- Know the words (lexicon)
- Pronunciation, Formation, Conjugation
- Know how the words form sentences
- Sentence structure, Compositional meaning
- Know how to interpret the sentence
- Statement, question,..
- Know how to group sentences
- Narrative coherence, dialogue
6Word-level Knowledge
- Lexicon
- List of legal words in a language
- Part of speech
- noun, verb, adjective, determiner
- Example
- Noun -gt cat dog mouse ball rock
- Verb -gt chase bite fetch bat
- Adjective -gt black brown furry striped
heavy - Determiner -gt the that a an
7Word-level Knowledge Issues
- Issue 1 Lexicon Size
- Potentially HUGE!
- Controlling factor morphology
- Store base forms (roots/stems)
- Use morphologic process to generate / analyze
- E.g. Dog dog(s) sing sings, sang, sung,
singing, singer,.. - Issue 2 Lexical ambiguity
- rock N/V dog N/V
- Time flies like a banana
8Sentence-level Knowledge Syntax
- Language models
- More than just words banana a flies time like
- Formal vs natural Grammar defines language
Recursively Enumerable
Any
Chomsky Hierarchy
Context AB-gtBA Sensitive
Context A-gt aBc Free
Regular S-gtaS Expression ab
9Syntactic Analysis Grammars
- Natural vs Formal languages
- Natural languages have degrees of acceptability
- It aint hard You gave what to whom?
- Grammar combines words into phrases
- S-gt NP VP
- NP -gt Det Adj N
- VP -gt V V NP V NP PP
10Syntactic Analysis Parsing
- Recover phrase structure from sentence
- Based on grammar
S
NP
VP
Det Adj N V NP
Det Adj N
The black cat chased the
furry mouse
11Syntactic Analysis Parsing
- Issue 1 Complexity
- Solution 1 Chart parser - dynamic programming
- O( )
- Issue 2 Structural ambiguity
- I saw the man on the hill with the telescope
- Is the telescope on the hill?
- Solution 2 (partial) Probabilistic parsing
12Semantic Analysis
- Grammatical Meaningful
- Colorless green ideas sleep furiously
- Compositional Semantics
- Meaning of a sentence is meaning of subparts
- Associate semantic interpretation with syntactic
- E.g. Nouns are variables (themselves) cat,mouse
- Adjectives unary predicates Black(cat),
Furry(mouse) - Verbs multi-place VP x chased(x,Furry(mouse))
- Sentence ( x chased(x, Furry(mouse))Black(cat)
- chased(Black(cat),Furry(mouse))
13Semantic Ambiguity
- Examples
- I went to the bank-
- of the river
- to deposit some money
- He banked
- at First Union
- the plane
- Interpretation depends on
- Sentence (or larger) topic context
- Syntactic structure
14Pragmatics Discourse
- Interpretation in context
- Act accomplished by utterance
- Do you have the time?, Can you pass the salt?
- Requests with non-literal meaning
- Also, includes politeness, performatives, etc
- Interpretation of multiple utterances
- The cat chased the mouse. It got away.
- Resolve referring expressions
15Natural Language Understanding
Meaning
Input
Tokenization/ Morphology
Parsing
Semantic Analysis
Pragmatics/ Discourse
- Key issues
- Knowledge
- How acquire this knowledge of language?
- Hand-coded? Automatically acquired?
- Ambiguity
- How determine appropriate interpretation?
- Pervasive, preference-based
16Handling Syntactic Ambiguity
- Natural language syntax
- Varied, has DEGREES of acceptability
- Ambiguous
- Probability framework for preferences
- Augment original context-free rules PCFG
- Add probabilities to transitions
0.2
NP -gt N NP -gt Det N NP -gt Det Adj N NP -gt NP PP
0.45
0.85
VP -gt V VP -gt V NP VP -gt V NP PP
S -gt NP VP S -gt S conj S
1.0
PP -gt P NP
0.65
0.45
0.15
0.10
0.10
0.05
17PCFGs
- Learning probabilities
- Strategy 1 Write (manual) CFG,
- Use treebank (collection of parse trees) to find
probabilities - Strategy 2 Use larger treebank ( linguistic
constraint) - Learn rules probabilities (inside-outside
algorithm) - Parsing with PCFGs
- Rank parse trees based on probability
- Provides graceful degradation
- Can get some parse even for unusual constructions
- low value
18Parse Ambiguity
S
S
NP
VP
NP
VP
N V NP
NP PP
N V NP PP
Det N P NP
Det N P NP
Det N
Det N
I saw the man with the telescope
I saw the man with the telescope
19Parse Probabilities
- T(ree),S(entence),n(ode),R(ule)
- T1 0.850.20.10.6510.65 0.007
- T2 0.850.20.450.050.6510.65 0.003
- Select T1
- Best systems achieve 92-93 accuracy
20Semantic Ambiguity
- Plant ambiguity
- Botanical vs Manufacturing senses
- Two types of context
- Local 1-2 words away
- Global several sentence window
- Two observations (Yarowsky 1995)
- One sense per collocation (local)
- One sense per discourse (global)
21Learn Disambiguators
- Initialize small set of seed cases
- Collect local context information
- collocations
- E.g. 2 words away from production, 1 word from
seed - Contexts rules
- Make decision list rules ranked by mutual info
- Iterate Labeling via DL, collecting contexts
- Label all entries in discourse with majority
sense - Repeat
22Disambiguate
- For each new unlabeled case,
- Use decision list to label
- gt 95 accurate on set of highly ambiguous
- Also used for accent restoration in e-mail
23Natural Language Processing
- Goals Understand and imitate distinctive human
capacity - Myriad applications MT, QA, SLS
- Key Issues
- Capturing knowledge of language
- Automatic acquisition current focus
linguisticsML - Resolving ambiguity, managing preference
- Apply (probabilistic) knowledge
- Effective in constrained environment