Pub Quiz - PowerPoint PPT Presentation

About This Presentation
Title:

Pub Quiz

Description:

Pub Quiz Question Answering Lecture 1 (Last week): Introduction; History of QA; Architecture of a QA system; Evaluation. Lecture 2 (Today): Question Classification ... – PowerPoint PPT presentation

Number of Views:4195
Avg rating:3.0/5.0
Slides: 73
Provided by: Joha162
Category:
Tags: nouns | plural | possessive | pub | quiz

less

Transcript and Presenter's Notes

Title: Pub Quiz


1
Pub Quiz
2
Question Answering
  • Lecture 1 (Last week)Introduction History of
    QA Architecture of a QA system Evaluation.
  • Lecture 2 (Today)Question Classification NLP
    techniques for question analysis Tokenisation
    Lemmatisation POS-tagging Parsing WordNet.
  • Lecture 3 (Next lecture)Retrieving Answers
    Document pre-processing Named Entity
    Recognition Anaphora Resolution Matching
    Reranking Sanity checking.

3
Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
4
Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
5
Syntactically Distinguishing Questions
  • Wh-questions
  • Where was Franz Kafka born?
  • How many countries are member of OPEC?
  • Who is Thom Yorke?
  • Why did David Koresh ask the FBI for a word
    processor?
  • How did Frank Zappa die?
  • Which boxer beat Muhammed Ali?

6
Syntactically Distinguishing Questions
  • Yes-no questions
  • Does light have weight?
  • Scotland is part of England true or false?
  • Choice-questions
  • Did Italy or Germany win the world cup in 1982?
  • Who is Harry Potters best friend Ron, Hermione
    or Sirius?

7
Syntactically Distinguishing Questions
  • Imperative
  • Name four European countries that produce wine.
  • Give the date of birth of Franz Kafka.
  • Declarative
  • I would like to know when Jim Morrison was born.

8
Semantically Distinguishing Questions
  • Divide questions according to their expected
    answer type
  • Simple Answer-Type Typology
  • PERSON
  • NUMERAL
  • DATE
  • MEASURE
  • LOCATION
  • ORGANISATION
  • ENTITY

9
Expected Answer Types
  • DATE
  • When was JFK killed?
  • In what year did Rome become the capital of Italy?

10
Expected Answer Types
  • DATE
  • When was JFK killed?
  • In what year did Rome become the capital of
    Italy?
  • PERSON
  • Who won the Nobel prize for Peace?
  • Which rock singer wrote Lithium?

11
Expected Answer Types
  • DATE
  • When was JFK killed?
  • In what year did Rome become the capital of
    Italy?
  • PERSON
  • Who won the Nobel prize for Peace?
  • Which rock singer wrote Lithium?
  • NUMERAL
  • How many inhabitants does Rome have?
  • Whats the population of Scotland?

12
Focus and Topic
  • Information expressed in a question can be
    structured into two parts
  • the focus information that is asked for
  • the topic information about focus
  • Example
  • How many inhabitants does Rome have?
  • FOCUS TOPIC

13
We need to know how to process natural language!
14
Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
Document Analysis
answer-type
question representation
passage representation
Answer Extraction
answers
15
Generating Query Terms
  • Example 1
  • Question Who discovered prions?
  • Text A Dr. Stanley Prusiner received the
    Nobel prize for the discovery of prions.
  • Text B Prions are a kind of proteins that
  • Query terms?

16
Generating Query Terms
  • Example 2
  • Question When did Franz Kafka die?
  • Text A Kafka died in 1924.
  • Text B Dr. Franz died in 1971.
  • Query terms?

17
Generating Query Terms
  • Example 3
  • Question How did actor James Dean die?
  • Text James Dean was killed in a car accident.
  • Query terms?

18
We need to know how to process natural language!
19
Architecture of a QA system
corpus
IR
query
Question Analysis
question
documents/passages
question representation
Document Analysis
answer-type
passage representation
Answer Extraction
answers
20
Difference in structure
  • Example
  • Question When did Franz Kafka die?
  • Text A The mother of Franz Kafka died in 1918.

21
Difference in structure
  • Example
  • Question When did Franz Kafka die?
  • Text A The mother of Franz Kafka died in 1918.
  • Text BKafka died in 1924.

22
Difference in structure
  • Example
  • Question When did Franz Kafka die?
  • Text A The mother of Franz Kafka died in 1918.
  • Text BKafka died in 1924.
  • Text CBoth Kafka and Lenin died in 1924.

23
Difference in structure
  • Example
  • Question When did Franz Kafka die?
  • Text A The mother of Franz Kafka died in 1918.
  • Text BKafka died in 1924.
  • Text CBoth Kafka and Lenin died in 1924.
  • Text DMax Brod, a friend of Kafka, died in 1930.

24
We need to know how to process natural language!
25
Natural Language is messy!
  • We need ways to automate the process of
    manipulating natural language
  • Punctuation
  • The way words are composed
  • The relationship between wordforms
  • The relationship between words
  • The structure of phrases
  • This is where NLP (Natural Language Processing)
    comes in!

26
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

27
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

28
Tokenisation
  • Tokenisation is the task that splits words from
    punctuation
  • Semicolons, colons
  • exclamation marks, question marks ! ?
  • commas and full stops . ,
  • quotes
  • Tokens are normally split by spaces

29
Tokenisation Example 1
  • Input (9 tokens)
  • When was the Buckingham Palace built in London,
    England?

30
Tokenisation Example 1
  • Input (9 tokens)
  • When was the Buckingham Palace built in London,
    England?
  • Output (11 tokens)
  • When was the Buckingham Palace built in London
    , England ?

31
Tokenisation Example 2
  • Input (7 tokens)
  • What year did "Snow White" come out?

32
Tokenisation Example 2
  • Input (7 tokens)
  • What year did "Snow White" come out?
  • Output (10 tokens)
  • What year did Snow White " come out ?

33
Tokenisation combined words
  • Combined words are split
  • Id ? I d
  • countrys ? country s
  • wont ? will nt
  • dont! ? do nt !
  • Some Italian examples
  • glielha detto ? gli lo ha detto
  • posso prenderlo ? posso prendere lo

34
Difficulties with tokenisation
  • Abbreviations, acronyms
  • When was the U.S. invasion of Haiti?
  • In particular if the abbreviation or acronym is
    the last word of a sentence
  • Look at next word if in uppercase, then assume
    it is end of sentence
  • But think of cases such as Mr. Jones

35
Why is tokenisation important?
  • To look up a word in an electronic dictionary
    (such as WordNet)
  • For all subsequent stages of processing
  • Lemmatisation
  • Parsing

36
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

37
Lemmatisation
  • Lemmatising means
  • grouping morphological variants of words under a
    single headword
  • For example, you could take the words
  • am, was, are, is, were, and been together
    under the word be 

38
Lemmatisation
  • Lemmatising means
  • grouping morphological variants of words under a
    single headword
  • For example, you could take the words
  • am, was, are, is, were, and been together
    under the word be 

39
Lemmatisation
  • Using linguistic terminology, the variants taken
    together form the lemma of a lexeme
  • Lexeme a lexical unit, an abstraction over
    specific constructions
  • Other examplesdying, die, died, dies ?
    diecar, cars ? carman, men ? man

40
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

41
Traditional parts of speech
  • Verb
  • Noun
  • Pronoun
  • Adjective
  • Adverb
  • Preposition
  • Conjunction
  • Interjection

42
Parts of speech in NLP
  • CLAWS1 (132 tags)
  • Examples
  • NN singular common noun (boy, pencil ... )
  • NN genitive singular common noun (boy's,
    parliament's ... )
  • NNP singular common noun with word initial
    capital (Austrian, American, Sioux, Eskimo ... )
  • NNP genitive singular common noun with word
    initial capital (Sioux', Eskimo's, Austrian's,
    American's, ...)
  • NNPS plural common noun with word initial capital
    (Americans, ... )
  • NNPS genitive plural common noun with word
    initial capital (Americans, )
  • NNS plural common noun (pencils, skeletons, days,
    weeks ... )
  • NNS genitive plural common noun (boys', weeks'
    ... )
  • NNU abbreviated unit of measurement unmarked for
    number (in, cc, kg )
  • Penn Treebank (45 tags)
  • Examples
  • JJ adjective (green, )
  • JJR adjective, comparative (greener,)
  • JJS adjective, superlative (greenest, )
  • MD modal (could, will, )
  • NN noun, singular or mass (table, )
  • NNS noun plural (tables, )
  • NNP proper noun, singular (John, )
  • NNPS proper noun, plural (Vikings, )
  • PDT predeterminer (both the boys)
  • POS possessive ending (friend's)
  • PRP personal pronoun (I, he, it, )
  • PRP possessive pronoun (my, his, )
  • RB adverb (however, usually, naturally, here,
    good, )
  • RBR adverb, comparative (better, )

43
POS tagged example
  • What
  • year
  • did
  • Snow
  • White
  • "
  • come
  • out
  • ?

44
POS tagged example
  • What WP
  • year NN
  • did VBD
  • Snow NNP
  • White NNP
  • "
  • come VB
  • out IN
  • ? .

45
Why is POS-tagging important?
  • To disambiguate words
  • For instance, to distinguish book used as a
    noun from book used as a verb
  • I like that book
  • Did you book a room?
  • Prerequisite for further processing stages, such
    as parsing

46
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

47
What is Parsing
  • Parsing is the process of assigning a syntactic
    structure to a sequence of words
  • The syntactic structure is defined using a
    grammar
  • A grammar contains of a set of symbols (terminal
    and non-terminal symbols) and production rules
    (grammar rules)
  • The lexicon is built over the terminal symbols
    (i.e., the words)

48
Syntactic Categories
  • The non-terminal symbols correspond to syntactic
    categories
  • Det (determiner)
  • N (noun)
  • IV (intransitive verb)
  • TV (transitive verb)
  • PN (proper name)
  • Prep (preposition)
  • NP (noun phrase) the car
  • PP (prepositional phrase) at the table
  • VP (verb phrase) saw a car
  • S (sentence) Mia likes
    Vincent

49
Example Grammar
  • Lexicon
  • Det which, a, the,
  • N rock, singer,
  • IV die, walk,
  • TV kill, write,
  • PN John, Lithium,
  • Prep on, from, to,
  • Grammar Rules
  • S ? NP VP
  • NP ? Det N
  • NP ? PN
  • N ? N N
  • N ? N PP
  • VP ? TV NP
  • VP ? IV
  • PP ? Prep NP
  • VP ? VP PP

50
The Parser
  • A parser automates the process of parsing
  • The input of the parser is a string of words
    (possibly annotated with POS-tags)
  • The output of a parser is a parse tree,
    connecting all the words
  • The way a parse tree is constructed is also
    called a derivation

51
Derivation Example
  • Which rock singer wrote Lithium

52
Lexical stage
  • Det N N TV PN
  • Which rock singer wrote Lithium

53
Use rule NP ? Det N
  • NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

54
Use rule NP ? PN
  • NP
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

55
Use rule VP ? TV NP

  • VP
  • NP
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

56
Backtracking

  • VP
  • NP
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

57
Use rule N ? N N
  • VP
  • N
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

58
Use rule NP ? Det N
  • NP VP
  • N
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

59
Use rule S ? NP VP
  • S
  • NP VP
  • N
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

60
Syntactic head
  • S
  • NP VP
  • N
    NP
  • Det N N TV PN
  • Which rock singer wrote Lithium

61
Parse Tree (another example)
  • S
  • NP
  • N
  • PP VP
  • NP VP
    PP
  • Det N Prep PN PN IV Prep NP
  • The mother of Franz Kafka died in 1918

62
Syntactic head
  • S
  • NP
  • N
  • PP VP
  • NP VP
    PP
  • Det N Prep PN PN IV Prep NP
  • The mother of Franz Kafka died in 1918

63
Using a parser
  • Normally expects tokenised and POS-tagged input
  • Example of wide-coverage parsers
  • Charniak parser
  • Collins parser
  • RASP (Carroll Briscoe)
  • CCG parser (Clark Curran)

64
NLP Techniques
  • Tokenisation
  • Lemmatisation
  • Part of Speech Tagging
  • Syntactic analysis (parsing)
  • WordNet

65
WordNet
  • Electronic dictionary
  • Not only words and definitions, but also
    relations between words
  • Four parts of speech
  • Nouns
  • Verbs
  • Adjectives
  • Adverbs

66
WordNet SynSets
  • Words are organised in SynSets
  • A SynSet is a group of words with the same
    meaning --- in other words, a set of synonyms
  • Example Rome, Roma, Eternal City, Italian
    Capital, capital of Italy

67
Senses
  • A word can have several different meanings
  • Example plant
  • A building for industrial labour
  • A living organism lacking the power of locomotion
  • The different meanings of a word are called
    senses
  • Therefore, one word can occur in more than one
    SynSet in WordNet

68
SynSet Example
  • mug, mugful the quantity that can be held in
    a mug
  • chump, fool, gull, mark, patsy, fall guy,
    sucker, soft touch, chump, mug a person who is
    gullible and easy to take advantage of
  • countenance, physiognomy, phiz, visage, kisser,
    smiler, mug the human face

69
Hypernyms
  • Hyperonomy is a WordNet relation defined among
    two SynSets
  • If A is a hypernym of B, then A is more generic
    then B
  • The inverse of hyperonomy is hyponomy
  • If A is a hyponym of B, then A is more specific
    then B
  • Use these relations transitively
  • Examples
  • cow and horse are hyponyms of animal
  • publication is a hypernym of book

70
Examples using WordNet
  • Which rock singer
  • singer is a hyponym of person, therefore expected
    answer type is PERSON
  • What is the population of
  • population is a hyponym of number, hence answer
    type NUMERAL

71
How to use NLP tools?
  • There is a large set of tools available on the
    web, most of it free for research
  • Examples of integrated text processing
    environment
  • GATE (University of Sheffield)
  • TTT (University of Edinburgh)
  • LingPipe
  • For a general ovewrview of NLP tools, see
    http//registry.dfki.de/

72
Question Answering
  • Lecture 1 (Last week)Introduction History of
    QA Architecture of a QA system Evaluation.
  • Lecture 2 (Today)Question Classification NLP
    techniques for question analysis Tokenisation
    Lemmatisation POS-tagging Parsing WordNet.
  • Lecture 3 (Next lecture)Retrieving Answers
    Document pre-processing Named Entity
    Recognition Anaphora Resolution Matching
    Reranking Sanity checking.
Write a Comment
User Comments (0)
About PowerShow.com