Title: Pub Quiz
1Pub Quiz
2Question Answering
- Lecture 1 (Last week)Introduction History of
QA Architecture of a QA system Evaluation. - Lecture 2 (Today)Question Classification NLP
techniques for question analysis Tokenisation
Lemmatisation POS-tagging Parsing WordNet. - Lecture 3 (Next lecture)Retrieving Answers
Document pre-processing Named Entity
Recognition Anaphora Resolution Matching
Reranking Sanity checking.
3Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
4Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
5Syntactically Distinguishing Questions
- Wh-questions
- Where was Franz Kafka born?
- How many countries are member of OPEC?
- Who is Thom Yorke?
- Why did David Koresh ask the FBI for a word
processor? - How did Frank Zappa die?
- Which boxer beat Muhammed Ali?
6Syntactically Distinguishing Questions
- Yes-no questions
- Does light have weight?
- Scotland is part of England true or false?
- Choice-questions
- Did Italy or Germany win the world cup in 1982?
- Who is Harry Potters best friend Ron, Hermione
or Sirius?
7Syntactically Distinguishing Questions
- Imperative
- Name four European countries that produce wine.
- Give the date of birth of Franz Kafka.
- Declarative
- I would like to know when Jim Morrison was born.
8Semantically Distinguishing Questions
- Divide questions according to their expected
answer type - Simple Answer-Type Typology
- PERSON
- NUMERAL
- DATE
- MEASURE
- LOCATION
- ORGANISATION
- ENTITY
9Expected Answer Types
- DATE
- When was JFK killed?
- In what year did Rome become the capital of Italy?
10Expected Answer Types
- DATE
- When was JFK killed?
- In what year did Rome become the capital of
Italy? - PERSON
- Who won the Nobel prize for Peace?
- Which rock singer wrote Lithium?
11Expected Answer Types
- DATE
- When was JFK killed?
- In what year did Rome become the capital of
Italy? - PERSON
- Who won the Nobel prize for Peace?
- Which rock singer wrote Lithium?
- NUMERAL
- How many inhabitants does Rome have?
- Whats the population of Scotland?
12Focus and Topic
- Information expressed in a question can be
structured into two parts - the focus information that is asked for
- the topic information about focus
- Example
- How many inhabitants does Rome have?
- FOCUS TOPIC
13We need to know how to process natural language!
14Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
15Generating Query Terms
- Example 1
- Question Who discovered prions?
- Text A Dr. Stanley Prusiner received the
Nobel prize for the discovery of prions. - Text B Prions are a kind of proteins that
- Query terms?
16Generating Query Terms
- Example 2
- Question When did Franz Kafka die?
- Text A Kafka died in 1924.
- Text B Dr. Franz died in 1971.
- Query terms?
17Generating Query Terms
- Example 3
- Question How did actor James Dean die?
-
- Text James Dean was killed in a car accident.
- Query terms?
18We need to know how to process natural language!
19Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
question representation
Document Analysis
answer-type
passage representation
Answer Extraction
answers
20Difference in structure
- Example
- Question When did Franz Kafka die?
- Text A The mother of Franz Kafka died in 1918.
21Difference in structure
- Example
- Question When did Franz Kafka die?
- Text A The mother of Franz Kafka died in 1918.
- Text BKafka died in 1924.
22Difference in structure
- Example
- Question When did Franz Kafka die?
- Text A The mother of Franz Kafka died in 1918.
- Text BKafka died in 1924.
- Text CBoth Kafka and Lenin died in 1924.
23Difference in structure
- Example
- Question When did Franz Kafka die?
- Text A The mother of Franz Kafka died in 1918.
- Text BKafka died in 1924.
- Text CBoth Kafka and Lenin died in 1924.
- Text DMax Brod, a friend of Kafka, died in 1930.
24We need to know how to process natural language!
25Natural Language is messy!
- We need ways to automate the process of
manipulating natural language - Punctuation
- The way words are composed
- The relationship between wordforms
- The relationship between words
- The structure of phrases
- This is where NLP (Natural Language Processing)
comes in!
26NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
27NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
28Tokenisation
- Tokenisation is the task that splits words from
punctuation - Semicolons, colons
- exclamation marks, question marks ! ?
- commas and full stops . ,
- quotes
- Tokens are normally split by spaces
29Tokenisation Example 1
- Input (9 tokens)
-
- When was the Buckingham Palace built in London,
England?
30Tokenisation Example 1
- Input (9 tokens)
-
- When was the Buckingham Palace built in London,
England? - Output (11 tokens)
-
- When was the Buckingham Palace built in London
, England ?
31Tokenisation Example 2
- Input (7 tokens)
- What year did "Snow White" come out?
32Tokenisation Example 2
- Input (7 tokens)
- What year did "Snow White" come out?
- Output (10 tokens)
- What year did Snow White " come out ?
33Tokenisation combined words
- Combined words are split
- Id ? I d
- countrys ? country s
- wont ? will nt
- dont! ? do nt !
- Some Italian examples
- glielha detto ? gli lo ha detto
- posso prenderlo ? posso prendere lo
34Difficulties with tokenisation
- Abbreviations, acronyms
- When was the U.S. invasion of Haiti?
- In particular if the abbreviation or acronym is
the last word of a sentence - Look at next word if in uppercase, then assume
it is end of sentence - But think of cases such as Mr. Jones
35Why is tokenisation important?
- To look up a word in an electronic dictionary
(such as WordNet) - For all subsequent stages of processing
- Lemmatisation
- Parsing
36NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
37Lemmatisation
- Lemmatising means
- grouping morphological variants of words under a
single headword - For example, you could take the words
- am, was, are, is, were, and been together
under the word be
38Lemmatisation
- Lemmatising means
- grouping morphological variants of words under a
single headword - For example, you could take the words
- am, was, are, is, were, and been together
under the word be
39Lemmatisation
- Using linguistic terminology, the variants taken
together form the lemma of a lexeme - Lexeme a lexical unit, an abstraction over
specific constructions - Other examplesdying, die, died, dies ?
diecar, cars ? carman, men ? man
40NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
41Traditional parts of speech
- Verb
- Noun
- Pronoun
- Adjective
- Adverb
- Preposition
- Conjunction
- Interjection
42Parts of speech in NLP
- CLAWS1 (132 tags)
- Examples
- NN singular common noun (boy, pencil ... )
- NN genitive singular common noun (boy's,
parliament's ... ) - NNP singular common noun with word initial
capital (Austrian, American, Sioux, Eskimo ... ) - NNP genitive singular common noun with word
initial capital (Sioux', Eskimo's, Austrian's,
American's, ...) - NNPS plural common noun with word initial capital
(Americans, ... ) - NNPS genitive plural common noun with word
initial capital (Americans, ) - NNS plural common noun (pencils, skeletons, days,
weeks ... ) - NNS genitive plural common noun (boys', weeks'
... ) - NNU abbreviated unit of measurement unmarked for
number (in, cc, kg )
- Penn Treebank (45 tags)
- Examples
- JJ adjective (green, )
- JJR adjective, comparative (greener,)
- JJS adjective, superlative (greenest, )
- MD modal (could, will, )
- NN noun, singular or mass (table, )
- NNS noun plural (tables, )
- NNP proper noun, singular (John, )
- NNPS proper noun, plural (Vikings, )
- PDT predeterminer (both the boys)
- POS possessive ending (friend's)
- PRP personal pronoun (I, he, it, )
- PRP possessive pronoun (my, his, )
- RB adverb (however, usually, naturally, here,
good, ) - RBR adverb, comparative (better, )
43POS tagged example
- What
- year
- did
-
- Snow
- White
- "
- come
- out
- ?
44POS tagged example
- What WP
- year NN
- did VBD
-
- Snow NNP
- White NNP
- "
- come VB
- out IN
- ? .
45Why is POS-tagging important?
- To disambiguate words
- For instance, to distinguish book used as a
noun from book used as a verb - I like that book
- Did you book a room?
- Prerequisite for further processing stages, such
as parsing
46NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
47What is Parsing
- Parsing is the process of assigning a syntactic
structure to a sequence of words - The syntactic structure is defined using a
grammar - A grammar contains of a set of symbols (terminal
and non-terminal symbols) and production rules
(grammar rules) - The lexicon is built over the terminal symbols
(i.e., the words)
48Syntactic Categories
- The non-terminal symbols correspond to syntactic
categories - Det (determiner)
- N (noun)
- IV (intransitive verb)
- TV (transitive verb)
- PN (proper name)
- Prep (preposition)
- NP (noun phrase) the car
- PP (prepositional phrase) at the table
- VP (verb phrase) saw a car
- S (sentence) Mia likes
Vincent
49Example Grammar
- Lexicon
- Det which, a, the,
- N rock, singer,
- IV die, walk,
- TV kill, write,
- PN John, Lithium,
- Prep on, from, to,
- Grammar Rules
- S ? NP VP
- NP ? Det N
- NP ? PN
- N ? N N
- N ? N PP
- VP ? TV NP
- VP ? IV
- PP ? Prep NP
- VP ? VP PP
50The Parser
- A parser automates the process of parsing
- The input of the parser is a string of words
(possibly annotated with POS-tags) - The output of a parser is a parse tree,
connecting all the words - The way a parse tree is constructed is also
called a derivation
51Derivation Example
-
-
-
- Which rock singer wrote Lithium
52Lexical stage
-
-
- Det N N TV PN
- Which rock singer wrote Lithium
53Use rule NP ? Det N
-
- NP
-
- Det N N TV PN
- Which rock singer wrote Lithium
54Use rule NP ? PN
-
- NP
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
55Use rule VP ? TV NP
-
-
VP - NP
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
56Backtracking
-
-
VP - NP
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
57Use rule N ? N N
-
- VP
- N
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
58Use rule NP ? Det N
-
- NP VP
- N
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
59Use rule S ? NP VP
- S
- NP VP
- N
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
60Syntactic head
- S
- NP VP
- N
NP -
- Det N N TV PN
- Which rock singer wrote Lithium
61Parse Tree (another example)
- S
-
- NP
- N
- PP VP
- NP VP
PP - Det N Prep PN PN IV Prep NP
- The mother of Franz Kafka died in 1918
62Syntactic head
- S
-
- NP
- N
- PP VP
- NP VP
PP - Det N Prep PN PN IV Prep NP
- The mother of Franz Kafka died in 1918
63Using a parser
- Normally expects tokenised and POS-tagged input
- Example of wide-coverage parsers
- Charniak parser
- Collins parser
- RASP (Carroll Briscoe)
- CCG parser (Clark Curran)
64NLP Techniques
- Tokenisation
- Lemmatisation
- Part of Speech Tagging
- Syntactic analysis (parsing)
- WordNet
65WordNet
- Electronic dictionary
- Not only words and definitions, but also
relations between words - Four parts of speech
- Nouns
- Verbs
- Adjectives
- Adverbs
66WordNet SynSets
- Words are organised in SynSets
- A SynSet is a group of words with the same
meaning --- in other words, a set of synonyms - Example Rome, Roma, Eternal City, Italian
Capital, capital of Italy
67Senses
- A word can have several different meanings
- Example plant
- A building for industrial labour
- A living organism lacking the power of locomotion
- The different meanings of a word are called
senses - Therefore, one word can occur in more than one
SynSet in WordNet
68SynSet Example
- mug, mugful the quantity that can be held in
a mug - chump, fool, gull, mark, patsy, fall guy,
sucker, soft touch, chump, mug a person who is
gullible and easy to take advantage of - countenance, physiognomy, phiz, visage, kisser,
smiler, mug the human face
69Hypernyms
- Hyperonomy is a WordNet relation defined among
two SynSets - If A is a hypernym of B, then A is more generic
then B - The inverse of hyperonomy is hyponomy
- If A is a hyponym of B, then A is more specific
then B - Use these relations transitively
- Examples
- cow and horse are hyponyms of animal
- publication is a hypernym of book
70Examples using WordNet
- Which rock singer
- singer is a hyponym of person, therefore expected
answer type is PERSON - What is the population of
- population is a hyponym of number, hence answer
type NUMERAL
71How to use NLP tools?
- There is a large set of tools available on the
web, most of it free for research - Examples of integrated text processing
environment - GATE (University of Sheffield)
- TTT (University of Edinburgh)
- LingPipe
- For a general ovewrview of NLP tools, see
http//registry.dfki.de/
72Question Answering
- Lecture 1 (Last week)Introduction History of
QA Architecture of a QA system Evaluation. - Lecture 2 (Today)Question Classification NLP
techniques for question analysis Tokenisation
Lemmatisation POS-tagging Parsing WordNet. - Lecture 3 (Next lecture)Retrieving Answers
Document pre-processing Named Entity
Recognition Anaphora Resolution Matching
Reranking Sanity checking.