Question Answering

About This Presentation

Title:

Question Answering

Description:

Question Answering Marti Hearst November 14, 2005 – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 96

Provided by: peopleIsc2

Learn more at: https://people.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Question Answering

1
Question Answering
Marti Hearst November 14, 2005
2
Question Answering

Outline
Introduction to QA
A typical full-fledged QA system
A very simple system, in response to this
An intermediate approach
Incorporating a reasoning system
Machine Learning of mappings
Other question types (e.g., biography,
definitions)

3
A of Search Types
Spectrum

What is the typical height of a giraffe?
What are some good ideas for landscaping my
clients yard?
What are some promising untried treatments for
Raynauds disease?

4
Beyond Document Retrieval

Document Retrieval
Users submit queries corresponding to their
information needs.
System returns (voluminous) list of full-length
documents.
It is the responsibility of the users to find
information of interest within the returned
documents.
Open-Domain Question Answering (QA)
Users ask questions in natural language.
What is the highest volcano in Europe?
System returns list of short answers.
Under Mount Etna, the highest volcano
in Europe, perches the fabulous town
A real use for NLP

5
Questions and Answers

What is the height of a typical giraffe?
The result can be a simple answer, extracted from
existing web pages.
Can specify with keywords or a natural language
query
However, most web search engines are not set up
to handle questions properly.
Get different results using a question vs.
keywords

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
The Problem of Question Answering
Natural language question, not keyword queries
What is the nationality of Pope John Paul II?
stabilize the country with its help, the
Catholic hierarchy stoutly held out for
pluralism, in large part at the urging of
Polish-born Pope John Paul II. When the Pope
emphatically defended the Solidarity trade union
during a 1987 tour of the
Short text fragment, not URL list
11
Question Answering from text

With massive collections of full-text documents,
simply finding relevant documents is of limited
use we want answers
QA give the user a (short) answer to their
question, perhaps supported by evidence.
An alternative to standard IR
The first problem area in IR where NLP is really
making a difference.

12
People want to ask questions
Examples from AltaVista query log who invented
surf music? how to make stink bombs where are the
snowdens of yesteryear? which english translation
of the bible is used in official catholic
liturgies? how to do clayart how to copy psx how
tall is the sears tower? Examples from Excite
query log (12/1999) how can i find someone in
texas where can i find information on puritan
religion? what are the 7 wonders of the world how
can i eliminate stress What vacuum cleaner does
Consumers Guide recommend
13
A Brief (Academic) History

In some sense question answering is not a new
research area
Question answering systems can be found in many
areas of NLP research, including
Natural language database systems
A lot of early NLP work on these
Problem-solving systems
STUDENT (Winograd 77)
LUNAR (Woods Kaplan 77)
Spoken dialog systems
Currently very active and commercially relevant
The focus is now on open-domain QA is new
First modern system MURAX (Kupiec, SIGIR93)
Trivial Pursuit questions
Encyclopedia answers
FAQFinder (Burke et al. 97)
TREC QA competition (NIST, 1999present)

14
AskJeeves

AskJeeves is probably most hyped example of
Question answering
How it used to work
Do pattern matching to match a question to their
own knowledge base of questions
If a match is found, returns a human-curated
answer to that known question
If that fails, it falls back to regular web
search
(Seems to be more of a meta-search engine now)
A potentially interesting middle ground, but a
fairly weak shadow of real QA

15
Question Answering at TREC

Question answering competition at TREC consists
of answering a set of 500 fact-based questions,
e.g.,
When was Mozart born?.
Has really pushed the field forward.
The document set
Newswire textual documents from LA Times, San
Jose Mercury News, Wall Street Journal, NY Times
etcetera over 1M documents now.
Well-formed lexically, syntactically and
semantically (were reviewed by professional
editors).
The questions
Hundreds of new questions every year, the total
is 2400
Task
Initially extract at most 5 answers long (250B)
and short (50B).
Now extract only one exact answer.
Several other sub-tasks added later definition,
list, biography.

16
Sample TREC questions
1. Who is the author of the book, "The Iron Lady
A Biography of Margaret Thatcher"? 2. What was
the monetary value of the Nobel Peace Prize in
1989? 3. What does the Peugeot company
manufacture? 4. How much did Mercury spend on
advertising in 1993? 5. What is the name of the
managing director of Apricot Computer? 6. Why
did David Koresh ask the FBI for a word
processor? 7. What is the name of the rare
neurological disease with symptoms such as
involuntary movements (tics), swearing, and
incoherent vocalizations (grunts, shouts, etc.)?
17
TREC Scoring

For the first three years systems were allowed to
return 5 ranked answer snippets (50/250 bytes) to
each question.
Mean Reciprocal Rank Scoring (MRR)
Each question assigned the reciprocal rank of the
first correct answer. If correct answer at
position k, the score is 1/k.
1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6
position
Mainly Named Entity answers (person, place, date,
)
From 2002 on, the systems are only allowed to
return a single exact answer and the notion of
confidence has been introduced.

18
Top Performing Systems

In 2003, the best performing systems at TREC can
answer approximately 60-70 of the questions
Approaches and successes have varied a fair deal
Knowledge-rich approaches, using a vast array of
NLP techniques stole the show in 2000-2003
Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC
)
Statistical systems starting to catch up
AskMSR system stressed how much could be achieved
by very simple methods with enough text (and now
various copycats)
People are experimenting with machine learning
methods
Middle ground is to use large collection of
surface matching patterns (ISI)

19
Example QA System

This system contains many components used by
other systems, but more complex in some ways
Most work completed in 2001 there have been
advances by this group and others since then.
Next slides based mainly on
Pasca and Harabagiu, High-Performance Question
Answering from Large Text Collections, SIGIR01.
Pasca and Harabagiu, Answer Mining from Online
Documents, ACL01.
Harabagiu, Pasca, Maiorano Experiments with
Open-Domain Textual Question Answering. COLING00

20
QA Block Architecture
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
21
Question Processing Flow
Question semantic representation
Construction of the question representation
Q
Question parsing
Answer type detection
AT category
Keyword selection
Keywords
22
Question Stems and Answer Types
Identify the semantic category of expected answers
Question Question stem Answer type
Q555 What was the name of Titanics captain? What Person
Q654 What U.S. Government agency registers trademarks? What Organization
Q162 What is the capital of Kosovo? What City
Q661 How much does one ton of cement cost? How much Quantity

Other question stems Who, Which, Name, How
hot...
Other answer types Country, Number, Product...

23
Detecting the Expected Answer Type

In some cases, the question stem is sufficient to
indicate the answer type (AT)
Why ? REASON
When ? DATE
In many cases, the question stem is ambiguous
Examples
What was the name of Titanics captain ?
What U.S. Government agency registers trademarks?
What is the capital of Kosovo?
Solution select additional question concepts (AT
words) that help disambiguate the expected answer
type
Examples
captain
agency
capital

24
Answer Type Taxonomy

Encodes 8707 English concepts to help recognize
expected answer type
Mapping to parts of Wordnet done by hand
Can connect to Noun, Adj, and/or Verb
subhierarchies

25
Answer Type Detection Algorithm

Select the answer type word from the question
representation.
Select the word(s) connected to the question.
Some content-free words are skipped (e.g.
name).
From the previous set select the word with the
highest connectivity in the question
representation.
Map the AT word in a previously built AT
hierarchy
The AT hierarchy is based on WordNet, with some
concepts associated with semantic categories,
e.g. writer ? PERSON.
Select the AT(s) from the first hypernym(s)
associated with a semantic category.

26
Answer Type Hierarchy
PERSON
PERSON
27
Evaluation of Answer Type Hierarchy

This evaluation done in 2001
Controlled the variation of the number of WordNet
synsets included in the answer type hierarchy.
Test on 800 TREC questions.

Precision score (50-byte answers)
Hierarchy coverage
0 0.296
3 0.404 10
0.437 25
0.451 50
0.461

The derivation of the answer type is the main
source of unrecoverable errors in the QA system

28
Keyword Selection

Answer Type indicates what the question is
looking for, but provides insufficient context to
locate the answer in very large document
collection
Lexical terms (keywords) from the question,
possibly expanded with lexical/semantic
variations provide the required context.

29
Lexical Term Extraction

Questions approximated by sets of unrelated words
(lexical terms)
Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms
Q002 What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize
Q003 What does the Peugeot company manufacture? Peugeot, company, manufacture
Q004 How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993
Q005 What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer
30
Keyword Selection Algorithm

Select all non-stopwords in quotations
Select all NNP words in recognized named entities
Select all complex nominals with their adjectival
modifiers
Select all other complex nominals
Select all nouns with adjectival modifiers
Select all other nouns
Select all verbs
Select the AT word (which was skipped in all
previous steps)

31
Keyword Selection Examples

What researcher discovered the vaccine against
Hepatitis-B?
Hepatitis-B, vaccine, discover, researcher
What is the name of the French oceanographer who
owned Calypso?
Calypso, French, own, oceanographer
What U.S. government agency registers trademarks?
U.S., government, trademarks, register, agency
What is the capital of Kosovo?
Kosovo, capital

32
Passage Retrieval
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
33
Passage Extraction Loop

Passage Extraction Component
Extracts passages that contain all selected
keywords
Passage size dynamic
Start position dynamic
Passage quality and keyword adjustment
In the first iteration use the first 6 keyword
selection heuristics
If the number of passages is lower than a
threshold ? query is too strict ? drop a keyword
If the number of passages is higher than a
threshold ? query is too relaxed ? add a keyword

34
Passage Retrieval Architecture
Passage Quality
Keywords
Yes
Keyword Adjustment
Passage Scoring
Passage Ordering
No
Passages
Ranked Passages
Passage Extraction
Documents
Document Retrieval
35
Passage Scoring

Passages are scored based on keyword windows
For example, if a question has a set of keywords
k1, k2, k3, k4, and in a passage k1 and k2 are
matched twice, k3 is matched once, and k4 is not
matched, the following windows are built

Window 1
Window 2
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
Window 3
Window 4
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
36
Passage Scoring

Passage ordering is performed using a radix sort
that involves three scores
SameWordSequenceScore (largest)
Computes the number of words from the question
that are recognized in the same sequence in the
window
DistanceScore (largest)
The number of words that separate the most
distant keywords in the window
MissingKeywordScore (smallest)
The number of unmatched keywords in the window

37
Answer Extraction
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
38
Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.

Answer type Person
Text passage
Among them was Christa McAuliffe, the first
private citizen to fly in space. Karen Allen,
best known for her starring role in Raiders of
the Lost Ark, plays McAuliffe. Brian Kerwin is
featured as shuttle pilot Mike Smith...
Best candidate answer Christa McAuliffe

39
Features for Answer Ranking

relNMW number of question terms matched in the
answer passage
relSP number of question terms matched in the
same phrase as the candidate answer
relSS number of question terms matched in the
same sentence as the candidate answer
relFP flag set to 1 if the candidate answer is
followed by a punctuation sign
relOCTW number of question terms matched,
separated from the candidate answer by at
most three words and one comma
relSWS number of terms occurring in the same
order in the answer passage as in the
question
relDTW average distance from candidate answer to
question term matches

SIGIR 01
40
Answer Ranking based on Machine Learning

Relative relevance score computed for each pair
of candidates (answer windows)
relPAIR wSWS ? ?relSWS wFP ? ?relFP
wOCTW ? ?relOCTW wSP ? ?relSP wSS
? ?relSS
wNMW ? ?relNMW wDTW ? ?relDTW
threshold
If relPAIR positive, then first candidate from
pair is more relevant
Perceptron model used to learn the weights
Scores in the 50 MRR for short answers, in the
60 MRR for long answers

41
Evaluation on the Web

test on 350 questions from TREC (Q250-Q600)
extract 250-byte answers

Google Answer extraction from Google AltaVista Answer extraction from AltaVista
Precision score 0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45
42
Can we make this simpler?

One reason systems became so complex is that they
have to pick out one sentence within a small
collection
The answer is likely to be stated in a
hard-to-recognize manner.
Alternative Idea
What happens with a much larger collection?
The web is so huge that youre likely to see the
answer stated in a form similar to the question
Goal make the simplest possible QA system by
exploiting this redundancy in the web
Use this as a baseline against which to compare
more elaborate systems.
The next slides based on
Web Question Answering Is More Always Better?
Dumais, Banko, Brill, Lin, Ng, SIGIR02
An Analysis of the AskMSR Question-Answering
System, Brill, Dumais, and Banko, EMNLP02.

43
AskMSR System Architecture
2
1
3
5
4
44
Step 1 Rewrite the questions

Intuition The users question is often
syntactically quite close to sentences that
contain the answer
Where is the Louvre Museum located?
The Louvre Museum is located in Paris
Who created the character of Scrooge?
Charles Dickens created the character of Scrooge.

45
Query rewriting

Classify question into seven categories
Who is/was/are/were?
When is/did/will/are/were ?
Where is/are/were ?
a. Hand-crafted category-specific transformation
rules
e.g. For where questions, move is to all
possible locations
Look to the right of the query terms for the
answer.
Where is the Louvre Museum located?
? is the Louvre Museum located
? the is Louvre Museum located
? the Louvre is Museum located
? the Louvre Museum is located
? the Louvre Museum located is
b. Expected answer Datatype (eg, Date, Person,
Location, )
When was the French Revolution? ? DATE

Nonsense,but ok. Its only a fewmore
queriesto the search engine.
46
Query Rewriting - weighting

Some query rewrites are more reliable than
others.

Where is the Louvre Museum located?
Weight 5if a match,probably right
Weight 1 Lots of non-answerscould come back too
the Louvre Museum is located
Louvre Museum located
47
Step 2 Query search engine

Send all rewrites to a Web search engine
Retrieve top N answers (100-200)
For speed, rely just on search engines
snippets, not the full text of the actual
document

48
Step 3 Gathering N-Grams

Enumerate all N-grams (N1,2,3) in all retrieved
snippets
Weight of an n-gram occurrence count, each
weighted by reliability (weight) of rewrite
rule that fetched the document
Example Who created the character of Scrooge?
Dickens 117
Christmas Carol 78
Charles Dickens 75
Disney 72
Carl Banks 54
A Christmas 41
Christmas Carol 45
Uncle 31

49
Step 4 Filtering N-Grams

Each question type is associated with one or more
data-type filters regular expression
When
Where
What
Who
Boost score of n-grams that match regexp
Lower score of n-grams that dont match regexp
Details omitted from paper.

Date
Location
Person
50
Step 5 Tiling the Answers
Scores 20 15 10
merged, discard old n-grams
Charles Dickens
Dickens
Mr Charles
Mr Charles Dickens
Score 45
N-Grams
N-Grams
tile highest-scoring n-gram
Repeat, until no more overlap
51
Results

Standard TREC contest test-bed (TREC 2001) 1M
documents 900 questions
Technique doesnt do too well (though would have
placed in top 9 of 30 participants)
MRR strict .34
MRR lenient .43
9th place

52
Results

From EMNLP02 paper
MMR of .577 answers 61 correctly
Would be near the top of TREC-9 runs
Breakdown of feature contribution

53
Issues

Works best/only for Trivial Pursuit-style
fact-based questions
Limited/brittle repertoire of
question categories
answer data types/filters
query rewriting rules

54
Intermediate ApproachSurface pattern discovery

Based on
Ravichandran, D. and Hovy E.H. Learning Surface
Text Patterns for a Question Answering System,
ACL02
Hovy, et al., Question Answering in Webclopedia,
TREC-9, 2000.
Use of Characteristic Phrases
"When was ltpersongt born
Typical answers
"Mozart was born in 1756.
"Gandhi (1869-1948)...
Suggests regular expressions to help locate
correct answer
"ltNAMEgt was born in ltBIRTHDATEgt
"ltNAMEgt ( ltBIRTHDATEgt-

55
Use Pattern Learning

Examples
The great composer Mozart (1756-1791) achieved
fame at a young age
Mozart (1756-1791) was a genius
The whole world would always be indebted to the
great music of Mozart (1756-1791)
Longest matching substring for all 3 sentences is
"Mozart (1756-1791)
Suffix tree would extract "Mozart (1756-1791)" as
an output, with score of 3
Reminiscent of IE pattern learning

56
Pattern Learning (cont.)

Repeat with different examples of same question
type
Gandhi 1869, Newton 1642, etc.
Some patterns learned for BIRTHDATE
a. born in ltANSWERgt, ltNAMEgt
b. ltNAMEgt was born on ltANSWERgt ,
c. ltNAMEgt ( ltANSWERgt -
d. ltNAMEgt ( ltANSWERgt - )

57
QA Typology from ISI

Typology of typical question forms94 nodes (47
leaf nodes)
Analyzed 17,384 questions (from answers.com)

58
Experiments

6 different question types
from Webclopedia QA Typology
BIRTHDATE
LOCATION
INVENTOR
DISCOVERER
DEFINITION
WHY-FAMOUS

59
Experiments pattern precision

BIRTHDATE
1.0 ltNAMEgt ( ltANSWERgt - )
0.85 ltNAMEgt was born on ltANSWERgt,
0.6 ltNAMEgt was born in ltANSWERgt
0.59 ltNAMEgt was born ltANSWERgt
0.53 ltANSWERgt ltNAMEgt was born
0.50 - ltNAMEgt ( ltANSWERgt
0.36 ltNAMEgt ( ltANSWERgt -
INVENTOR
1.0 ltANSWERgt invents ltNAMEgt
1.0 the ltNAMEgt was invented by ltANSWERgt
1.0 ltANSWERgt invented the ltNAMEgt in

60
Experiments (cont.)

DISCOVERER
1.0 when ltANSWERgt discovered ltNAMEgt
1.0 ltANSWERgt's discovery of ltNAMEgt
0.9 ltNAMEgt was discovered by ltANSWERgt in
DEFINITION
1.0 ltNAMEgt and related ltANSWERgt
1.0 form of ltANSWERgt, ltNAMEgt
0.94 as ltNAMEgt, ltANSWERgt and

61
Experiments (cont.)

WHY-FAMOUS
1.0 ltANSWERgt ltNAMEgt called
1.0 laureate ltANSWERgt ltNAMEgt
0.71 ltNAMEgt is the ltANSWERgt of
LOCATION
1.0 ltANSWERgt's ltNAMEgt
1.0 regional ltANSWERgt ltNAMEgt
0.92 near ltNAMEgt in ltANSWERgt
Depending on question type, get high MRR
(0.60.9), with higher results from use of Web
than TREC QA collection

62
Shortcomings Extensions

Need for POS /or semantic types
"Where are the Rocky Mountains?
"Denver's new airport, topped with white
fiberglass cones in imitation of the Rocky
Mountains in the background , continues to lie
empty
ltNAMEgt in ltANSWERgt
NE tagger /or ontology could enable system to
determine "background" is not a location

63
Shortcomings... (cont.)

Long distance dependencies
"Where is London?
"London, which has one of the busiest airports
in the world, lies on the banks of the river
Thames
would require pattern likeltQUESTIONgt,
(ltany_wordgt), lies on ltANSWERgt
Abundance variety of Web data helps system to
find an instance of patterns w/o losing answers
to long distance dependencies

64
Shortcomings... (cont.)

System currently has only one anchor word
Doesn't work for Q types requiring multiple words
from question to be in answer
"In which county does the city of Long Beach
lie?
"Long Beach is situated in Los Angeles County
required pattern ltQ_TERM_1gt is situated in
ltANSWERgt ltQ_TERM_2gt
Does not use case
"What is a micron?
"...a spokesman for Micron, a maker of
semiconductors, said SIMMs are..."
If Micron had been capitalized in question, would
be a perfect answer

65
The Importance of NER

The results of the past 5 TREC evaluations of QA
systems indicate that current state-of-the-art QA
is determined by the recognition of Named
Entities
In TREC 2003 the LCC QA system extracted 289
correct answers for factoid questions
The Name Entity Recognizer was responsible for
234 of them

QUANTITY 55 ORGANIZATION 15 PRICE 3
NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2
DATE 35 PRODUCT 11 ACRONYM 1
PERSON 31 CONTINENT 5 ADDRESS 1
COUNTRY 21 PROVINCE 5 ALPHABET 1
OTHER LOCATIONS 19 QUOTE 5 URI 1
CITY 19 UNIVERSITY 3
66
The Special Case of Names
Questions asking for names of authored works
1934 What is the play West Side Story based on? Answer Romeo and Juliet
1976 What is the motto for the Boy Scouts? Answer Be prepared.
1982 What movie won the Academy Award for best picture in 1989? Answer Driving Miss Daisy
2080 What peace treaty ended WWI? Answer Versailles
2102 What American landmark stands on Liberty Island? Answer Statue of Liberty
67
Problems

NE assumes all answers are named entities
Oversimplifies the generative power of language!
What about What kind of flowers did Van Gogh
paint?
Does not account well for morphological, lexical,
and semantic alternations
Question terms may not exactly match answer
terms connections between alternations of Q and
A terms often not documented in flat dictionary
Example When was Berlins Brandenburger Tor
erected? ? no guarantee to match built
Recall suffers

68
LCC ApproachWordNet to the rescue!

WordNet can be used to inform all three steps of
the Q/A process
1. Answer-type recognition (Answer Type
Taxonomy)
2. Passage Retrieval (specificity
constraints)
3. Answer extraction (recognition of keyword
alternations)
Using WNs lexico-semantic info Examples
What kind of flowers did Van Gogh paint?
Answer-type recognition need to know (a) answer
is a kind of flower, and (b) sense of the word
flower
WordNet encodes 470 hyponyms of flower sense 1,
flowers as plants
Nouns from retrieved passages can be searched
against these hyponyms
When was Berlins Brandenburger Tor erected?
Semantic alternation erect is a hyponym of
sense 1 of build

69
WN for Answer Type Recognition

Encodes 8707 English concepts to help recognize
expected answer type
Mapping to parts of Wordnet done by hand
Can connect to Noun, Adj, and/or Verb
subhierarchies

70
WN in Passage Retrieval

Identify relevant passages from text
Extract keywords from the question, and
Pass them to the retrieval module
Specificity filtering question
concepts/keywords
Focuses search, improves performance and
precision
Question keywords can be omitted from the search
if they are too general
Specificity calculated by counting the hyponyms
of a given keyword in WordNet
Count ignores proper names and same-headed
concepts
Keyword is thrown out if count is above a given
threshold (currently 10)

71
WN in Answer Extraction

If keywords alone cannot find an acceptable
answer, look for alternations in WordNet!

Q196 Who wrote Hamlet?
Morphological Alternation wrote ? written
Answer before the young playwright has written Hamlet and Shakespeare seizes the opportunity
Q136 Who is the queen of Holland?
Lexical Alternation Holland ? Netherlands
Answer Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present
Q196 What is the highest mountain in the world?
Semantic Alternation mountain ? peak
Answer first African country to send an expedition to Mount Everest, the worlds highest peak
72
Evaluation

Pasca/Harabagiu (NAACL01 Workshop) measured
approach using TREC-8 and TREC-9 test collections
WN contributions to Answer Type Recognition
Count number of questions for which acceptable
answers were found 3GB text collection, 893
questions

Method questions with correct answer type
All What only
Flat dictionary (baseline) 227 (32) 48 (13)
A-type taxonomy (static) 445 (64) 179 (50)
A-type taxonomy (dynamic) 463 (67) 196 (56)
A-type taxonomy (dynamic answer patterns) 533 (76) 232 (65)
73
Evaluation

WN contributions to Passage Retrieval
Impact of keyword alternations
Impact of specificity knowledge

No alternations enabled 55.3 precision
Lexical alternations enabled 67.6
Lexical semantic alternations enabled 73.7
Morphological expansions enabled 76.5
Specificity knowledge questions with correct answer in first 5 documents returned
TREC-8 TREC-9
Not included 133 (65) 463 (67)
Included 151 (76) 515 (74)
74
Going Beyond Word Matching

Use techniques from artificial intelligence to
try to draw inferences from the meanings of the
words
This is a highly unusual and ambitious approach.
Surprising it works at all!
Requires huge amounts of hand-coded information
Uses notions of proofs and inference from logic
All birds fly. Robins are birds. Thus, robins
fly.
forall(X) bird(X) -gt fly(x)
forall(X,Y) student(X), enrolled(X,Y) -gt
school(Y)

75
Inference via a Logic Prover

The LCC system attempts inference to justify an
answer
Its inference engine is a kind of funny middle
ground between logic and pattern matching
But quite effective 30 improvement
Q When was the internal combustion engine
invented?
A The first internal-combustion engine was built
in 1867.
invent -gt create_mentally -gt create -gt build

76
COGEX

World knowledge from
WordNet glosses converted to logic forms in the
eXtended WordNet (XWN) project
Lexical chains
gamen3 ? HYPERNYM ? recreationn1 ? HYPONYM ?
sportn1
Argentinea1 ? GLOSS ? Argentinan1
NLP axioms to handle complex NPs, coordinations,
appositions, equivalence classes for prepositions
etcetera
Named-entity recognizer
John Galt ? HUMAN
A relaxation mechanism is used to iteratively
uncouple predicates, remove terms from LFs. The
proofs are penalized based on the amount of
relaxation involved.

77
Logic Inference Example

How hot does the inside of an active volcano
get?
get(TEMPERATURE, inside(volcano(active)))
lava fragments belched out of the mountain were
as hot as 300 degrees Fahrenheit
fragments(lava, TEMPERATURE(degrees(300)),
belched(out, mountain))
volcano ISA mountain
lava ISPARTOF volcano -gt lava inside volcano
fragments of lava HAVEPROPERTIESOF lava
The needed semantic information is in WordNet
definitions, and was successfully translated into
a form that was used for rough proofs

78
Axiom Creation

XWN Axioms
A major source of world knowledge is a general
purpose knowledge base of more than 50,000 parsed
and disambiguated WordNet glosses that are
transformed into logical form for use during the
course of a proof.
Gloss
Kill is to cause to die
Logical Form
kill_VB_1(e1,x1,x2) -gt cause_VB_1(e1,x1,x3)
to_TO(e1,e2) die_VB_1(e2,x2,x4)

79
Lexical Chains

Lexical Chains
Lexical chains provide an improved source of
world knowledge by supplying the Logic Prover
with much needed axioms to link question keywords
with answer concepts.
Question
How were biological agents acquired by bin
Laden?
Answer
On 8 July 1998 , the Italian newspaper Corriere
della Serra indicated that members of The World
Front for Fighting Jews and Crusaders , which was
founded by Bin Laden , purchased three chemical
and biological_agent production facilities in
Lexical Chain
( v - buy1, purchase1 ) HYPERNYM ( v - get1,
acquire1 )

Axiom Selection
Lexical chains and the XWN knowledge base work
together to select and generate the axioms needed
for a successful proof when all the keywords in
the questions are not found in the answer.
Question
How did Adolf Hitler die?
Answer
Adolf Hitler committed suicide
The following Lexical Chain is detected
( n - suicide1, self-destruction1,
self-annihilation1 ) GLOSS ( v - kill1 ) GLOSS
( v - die1, decease1, perish1, go17, exit3,
pass_away1, expire2, pass25 ) 2
The following axioms are loaded into the Prover
exists x2 all e1 x1 (suicide_nn(x1) -gt
act_nn(x1) of_in(x1,e1) kill_vb(e1,x2,x2)).
exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2)
-gt cause_vb_2(e1,x1,x3) to_to(e1,e2)
die_vb(e2,x2,x4)).

81
LCC System Refecences

The previous set of slides drew information from
these sources
The Informative Role of WordNet in Open-Domain
Question Answering, Pasca and Harabagiu, WordNet
and Other Lexical Resources, NAACL 2001 Workshop
Pasca and Harabagiu, High Performance
Question/Answering, SIGIR01
Moldovan, Clark, Harabagiu, Maiorano COGEX A
Logic Prover for Question Answering. HLT-NAACL
2003
Moldovan, Pasca, Harabagiu, and Surdeanu
Performance issues and error analysis in an
open-domain question answering system. ACM Trans.
Inf. Syst. 21(2) 133-154 (2003)
Harabagiu and Maiorano, Abductive Processes for
Answer Justification, AAAI Spring Symposium on
Mining Answers from Texts and Knowledge Bases,
2002

82
Using Machine Learning in QA

The following slides are based on
Ramakrishnan, Chakrabarti, Paranjpe,
Bhattacharyya, Is Question Answering an Acquired
Skill? WWW04

83
Learning Answer Type Mapping

Idea use machine learning techniques to
automatically determine answer types and query
terms from questions.
Two types of answer types
Surface patterns
Infinite set, so cant be covered by a lexicon
DATES NUMBERS PERSON NAMES LOCATIONS
at DDDD in the DDs in DDDD Xx said
Can also associate with synset daten7
WordNet synsets
Consider name an animal that sleeps upright
Answer horse

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
84
Determining Answer Types

The hard ones are what and which questions.
Two useful heuristics
If the head of the NP appearing before the
auxiliary or main verb is not a wh-word, mark
this as an a-type clue
Otherwise, the head of the NP appearing after the
auxiliary/main verb is an atype clue.

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
85
Learning Answer Types

Given a QA pair (q, a)
(name an animal that sleeps upright, horse)
(1a) See which atype(s) horse can map to
(1b) Look up the hypernyms of horse -gt S
(2a) Record the k words to the right of the
q-word
(2b) For each of these k words, look up their
synsets
An, animal, that
(2c) Increment the counts for those synsets that
also appear in S
Do significance testing
Compare synset frequencies against a background
set
Retain only those that are significantly
associated with the question word more so than in
general (chi-square)

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
86
Learning Answer Tyeps
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
87
Learning to Choose Query Terms

Which words from the question to use in the
query?
A tradeoff between precision and recall.
Example
Tokyo is the capital of which country?
Want to use Tokyo verbatim
Probably capital as well
But maybe not country maybe nation or maybe
this word wont appear in the retrieved passage
at all.
Also, country corresponds to the answer type,
so probably we dont want to require it to be in
the answer text.

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
88
Learning to Choose Query Terms

Features
POS assigned to word and immediate neighbors
Starts with uppercase letter
Is a stopword
IDF score
Is an answer-type for this question
Ambiguity indicators
of possible WordNet senses (NumSense)
of other WordNet synsets that describe this
sense
E.g., for buck stag, deer, doe
(NumLemma)
Learner
J48 decision tree worked best

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
89
Learning to Choose Query Terms

Results
WordNet ambiguity indicators were very helpful
Raised accuracy from 71-73 to 80
Atype flag improved accuracy from 1-3

90
Learning to Score Passages

Given a question, and answer, a passage (q, a,
r)
Assign 1 if r contains a
Assign 1 otherwise
Features
Do selected terms s from q appear in r?
Does r have an answer zone a that does not s?
Are the distances between tokens in a and s
small?
Does a have a strong WordNet similarity with qs
answer type?
Learner
Use logistic regression, since it produces a
ranking rather than a hard classification into 1
or 1
Produces a continuous estimate between 0 and 1

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
91
Learning to Score Passages

Results
F-scores are low (.33 - .56)
However, reranking greatly improves the rank of
the corresponding passages.
Eliminates many non-answers, pushing better
passages towards the top.

92
Learning to Score Passages
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
93
Computing WordNet Similarity

Path-based similarity measures are not all that
good in WordNet
3 hops from entity to artifact
3 hops from mammal to elephant
An alternative
Given a target synset t and an answer synset a
Measure the overlap of nodes on the path
from t to all noun roots and
from a to all noun roots
Algorithm for computing similarity of t to a
If t is not a hypernym of a assign 0
Else collect the set of hypernym synsets of t and
a
Call them Ht and Ha
Compute the Jaccard overlap
Ht Intersect Ha / Ht Union Ha

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
94
Computing WordNet Similarity

Algorithm for computing similarity of t to a
Ht Intersect Ha / Ht Union Ha

entity
Ht Intersect Ha
object
living thing
Ht Union Ha
organism
Ht mammal, Ha elephant 7/10 .7 Ht animal,
Ha elephant 5/10 .5 Ht animal, Ha
mammal 4/7 .57 Ht mammal, Ha fox 7/11 .63
animal
chordate
vertebrate
mammal
placental mammal
proboscidean
elephant
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
95
System Extension Definition Questions

Definition questions ask about the definition or
description of a concept
Who is John Galt?
What is anorexia nervosa?
Many information nuggets are acceptable answers
Who is George W. Bush?
George W. Bush, the 43rd President of the
United States
George W. Bush defeated Democratic incumbentAnn
Richards to become the 46th Governor of the State
of Texas
Scoring
Any information nugget is acceptable
Precision score over all information nuggets

96
Definition Detection with Pattern Matching
Q386 What is anorexia nervosa? cause of anorexia nervosa, an eating disorder...
Q358 What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340 Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
97
Answer Detection with Concept Expansion

Enhancement for Definition questions
Identify terms that are semantically related to
the phrase to define
Use WordNet hypernyms (more general concepts)

Question WordNet hypernym Detected answer candidate
What is a shaman? priest, non-Christian priest Mathews is the priest or shaman
What is a nematode? worm nematodes, tiny worms in soil
What is anise? herb, herbaceous plant anise, rhubarb and other herbs
98
Online QA Examples