Title: Linguistically Rich Statistical Models of Language
1Linguistically Rich Statistical Models of
Language
- Joseph Smarr
- M.S. Candidate
- Symbolic Systems Program
- Advisor Christopher D. Manning
- December 5th, 2002
2Grand Vision
- Talk to your computer like another human
- HAL, Star Trek, etc.
- Ask your computer a question, it finds the answer
- Whos speaking at this weeks SymSys Forum?
- Computer can read and summarize text for you
- Whats the cutting edge in NLP these days?
3Were Not There (Yet)
- Turns out behaving intelligently is difficult
- What does it take to achieve the grand vision?
- General Artificial Intelligence problems
- Knowledge representation, common sense reasoning,
etc. - Language-specific problems
- Complexity, ambiguity, and flexibility of
language - Always underestimated because language is so
easy for us!
4Are There Useful Sub-Goals?
- Grand vision is still too hard, but we can solve
simpler problems that are still valuable - Filter news for stories about new tech gadgets
- Take the SSP talk email and add it to my calendar
- Dial my cell phone by speaking my friends name
- Automatically reply to customer service e-mails
- Find out which episode of The Simpsons is tonight
- Two approaches to understanding language
- Theory-driven Theoretical Linguistics
- Task-driven Natural Language Processing
5Theoretical Linguistics vs. NLP
- Theoretical Linguistics
- Goal
- Understand peoples Knowledge of language
- Method
- Rich logical representations of languages hidden
structure and meaning - Guiding principles
- Separation of (hidden) knowledge of language and
(observable) performance - Grammaticality is categorical (all or none)
- Describe what are possible and impossible
utterances
- Natural Language Processing
- Goal
- Develop practical tools for analyzing speech /
text - Method
- Simple, robust models of everyday language use
that are sufficient to perform tasks - Guiding principles
- Exploit (empirical) regularities and patterns in
examples of language in text collections - Sentence goodness is gradient (better or worse)
- Deal with the utterances youre given, good or bad
6Theoretical Linguistics vs. NLP
Linguistics
NLP
7Linguistic Puzzle
- When dropping an argument, why do some verbs keep
the subject and some keep the object? - John sang the song ? John sang
- John broke the vase ? The vase broke
- Not just quirkiness of language
- Similar patterns show up in other languages
- Seems to involve deep aspects of verb meaning
- Rules to account for this phenomenon
- Two classes of verbs (unergative unaccusative)
- Remaining argument must be realized as subject
8Exception Imperatives
- Open the pod bay doors, Hal
- Different goals lead to study of different
problems. In NLP... - Need to recognize this as a command
- Need to figure out what specific action to take
- Irrelevant how youd say it in French
- Describing language vs. working with language
- But both tasks clearly share many sub-problems
9Theoretical Linguistics vs. NLP
- Potential for much synergy between linguistics
and NLP - However, historically they have remained quite
distinct - Chomsky (founder of generative grammar)
- It must be recognized that the notion
probability of a sentence is an entirely
useless one, under any known interpretation of
this term. - Karttunen (founder of finite state technologies
at Xerox) - Linguists reaction to NLP Not interested. You
do not understand Theory. Go away you geek. - Jelinek (former head of IBM speech project)
- Every time I fire a linguist, the performance of
our speech recognition system goes up.
10Potential Synergies
- Lexical acquisition (unknown words)
- Statistically infer new lexical entries from
context - Modeling naturalness and conventionality
- Use corpus data to weight constructions
- Dealing with ungrammatical utterances
- Find most similar / most likely correction
- Richer patterns for finding information in text
- Use argument structure / semantic dependencies
- More powerful models for speech recognition
- Progressively build parse tree while listening
11Finding Information in Text
- US Government has sponsored lots of research in
information extraction from news articles - Find mentions of terrorists and which locations
theyre targeting - Find which companies are being acquired by which
others and for how much - Progress driven by simplifying the models used
- Early work used rich linguistic parsers
- Unable to robustly handle natural text
- Modern work is mainly finite state patterns
- Regular expressions are very practical and
successful
12Web Information Extraction
- How much does that text book cost on Amazon?
- Learn patterns for finding relevant fields
Concept Book
Title Foundations of Statistical Natural Language Processing
Author(s) Christopher D. Manning Hinrich Schütze
Price 58.45
13Improving IE Performance on Natural Text
Documents
- How can we scale IE back up for natural text?
- Need to look elsewhere for regularities to
exploit - Idea Consider grammatical structure
- Run shallow parser on each sentence
- Flatten output into sequence of typed chunks
14Power of Linguistic Features
21 increase
65 increase
45 increase
15Linguistically Rich(er) IE
- Exploit more grammatical structure for patterns
- e.g. Tim Grows work on IE with PCFGs
Spur, acq, amt
VPacq, amt
VPacq, amt
NPpur
MD
NNP
PPamt
NNP
NNP
VB
will
pur
pur
pur
NPamt
NPacq
acquire
IN
First
Union
Corp
NNP
CD
CD
NNP
NNP
NNP
for
acq
acq
acq
amt
amt
amt
three
million
Sheland
Bank
Inc
dollars
16Classifying Unknown Words
- Which of the following is the name of a city?
- Most linguistic grammars assume a fixed lexicon
- How do humans learn to deal with new words?
- Context (I spent a summer living in
Wethersfield) - Makeup of the word itself (phonesthetics)
- Idea Learn distinguishing letter sequences
17Whats in a Name?
18Generative Model of PNPs
Length n-gram model and word model P(pnpc)
Pn-gram(word-lengths(pnp))
Pword i?pnp P(wiword-length(wi))
Word model mixture of character n-gram model and
common word model P(wilen) llenPn-gram(wilen)
k/len (1-llen) Pword(wilen)
N-Gram Models deleted interpolation P0-gram(symbo
lhistory) uniform-distribution Pn-gram(sh)
lC(h)Pempirical(sh) (1- lC(h))P(n-1)-gram(sh)
19Experimental Results
20Knowledge of Frequencies
- Linguistics traditionally assumes Knowledge of
Language doesnt involve counting - Letter frequencies are clearly an important
source of knowledge for unknown words - Similarly, we saw before that there are regular
patterns to exploit in grammatical information - Take home point
- Combining Statistical NLP methods with richer
linguistic representations is a big win!
21Language is Ambiguous!
- Ban on Nude Dancing on Governors Desk from a
Georgia newspaper column discussing current
legislation - Lebanese chief limits access to private parts
talking about an Army Generals initiative - Death may ease tension an article about the
death of Colonel Jean-Claude Paul in Haiti - Iraqi Head Seeks Arms
- Juvenile Court to Try Shooting Defendant
- Teacher Strikes Idle Kids
- Stolen Painting Found By Tree
22Language is Ambiguous!
- Local HS Dropouts Cut in Half
- Obesity Study Looks for Larger Test Group
- British Left Waffles on Falkland Islands
- Red Tape Holds Up New Bridges
- Man Struck by Lightning Faces Battery Charge
- Clinton Wins on Budget, but More Lies Ahead
- Hospitals Are Sued by 7 Foot Doctors
- Kids Make Nutritious Snacks
23Coping With Ambiguity
- Categorical grammars like HPSG provide many
possible analyses for sentences - 455 parses for List the sales of the products
produced in 1973 with the products produced in
1972. (Martin et al, 1987) - In most cases, only one interpretation is
intended - Initial solution was hand-coded preferences among
rules - Hard to manage as number of rules increase
- Need to capture interactions among rules
24Statistical HPSG Parse Selection
- HPSG provides deep analyses of sentence structure
and meaning - Useful for NLP tasks like question answering
- Need to solve disambiguation problem to make
using these richer representations practical - Idea Learn statistical preferences among
constructions from hand-disambiguated collection
of sentences - Result Correct analysis chosen gt80 of the time
- StatNLP methods Linguistic representation Win
25Towards Semantic Extraction
- HPSG provides representation of meaning
- Who did what to whom?
- Computers need meaning to do inference
- Can we extend information extraction methods to
extract meaning representations from pages? - Current project IE for the semantic web
- Large project to build rich ontologies to
describe the content of web pages for intelligent
agents - Use IE to extract new instances of concepts from
web pages (as opposed to manual labeling) - student(Joseph), univ(Stanford), at(Joseph,
Stanford)
26Towards the Grand Vision?
- Collaboration between Theoretical Linguistics and
NLP is important step forward - Practical tools with sophisticated language power
- How can we ever teach computers enough about
language and the world? - Hawking Moores Law is sufficient
- Moravec mobile robots must learn like children
- Kurzweil reverse-engineer the human brain
- The experts agree Symbolic Systems is the
future!
27Upcoming Convergence Courses
- Ling 139M Machine Translation Win
- Ling 239E Grammar Engineering Win
- CS 276B Text Information Retrieval Win
- Ling 239A Parsing and Generation Spr
- CS 224N Natural Language Processing Spr
Get Involved!!