Question%20Answering%20Techniques%20and%20Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Question%20Answering%20Techniques%20and%20Systems

Description:

... believe an unknown Chinese worker probably drove the last steel spike ... After a short Zodiac ride we embarked our buses with local guides and went up ... – PowerPoint PPT presentation

Number of Views:895

Avg rating:3.0/5.0

Slides: 66

Provided by: danmol

Learn more at: https://www.cs.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Question%20Answering%20Techniques%20and%20Systems

1
Question Answering Techniques and Systems

Mihai Surdeanu (TALP)
Marius Pasca (Google - Research)

TALP Research Center Dep. Llenguatges i Sistemes
Informàtics Universitat Politècnica de
Catalunya surdeanu_at_lsi.upc.es
The work by Marius Pasca (currently
mars_at_google.com) was performed as part of his PhD
work at Southern Methodist University in Dallas,
Texas.
2
Overview

What is Question Answering?
A traditional system
Other relevant approaches
Distributed Question Answering

3
Problem of Question Answering
When was the San Francisco fire? were driven
over it. After the ceremonial tie was removed -
it burned in the San Francisco fire of 1906
historians believe an unknown Chinese worker
probably drove the last steel spike into a wooden
tie. If so, it was only
What is the nationality of Pope John Paul II?
stabilize the country with its help, the Catholic
hierarchy stoutly held out for pluralism, in
large part at the urging of Polish-born Pope John
Paul II. When the Pope emphatically defended the
Solidarity trade union during a 1987 tour of the
Where is the Taj Mahal? list of more than 360
cities around the world includes the Great Reef
in Australia, the Taj Mahal in India, Chartres
Cathedral in France, and Serengeti National Park
in Tanzania. The four sites Japan has listed
include
4
Problem of Question Answering
Natural language question, not keyword queries
What is the nationality of Pope John Paul II?
stabilize the country with its help, the Catholic
hierarchy stoutly held out for pluralism, in
large part at the urging of Polish-born Pope John
Paul II. When the Pope emphatically defended the
Solidarity trade union during a 1987 tour of the
Short text fragment, not URL list
5
Compare with
Document collection
Searching for Etna
Where is Naxos?
Searching for Naxos
What continent is Taormina in?
What is the highest volcano in Europe?
Searching for Taormina
6
Beyond Document Retrieval

Document Retrieval
Users submit queries corresponding to their
information needs.
System returns (voluminous) list of full-length
documents.
It is the responsibility of the users to find
information of interest within the returned
documents.
Open-Domain Question Answering (QA)
Users ask questions in natural language.
What is the highest volcano in Europe?
System returns list of short answers.
Under Mount Etna, the highest volcano
in Europe, perches the fabulous town
Often more useful for specific information needs.

7
Evaluating QA Systems

National Institute of Standards and Technology
(NIST) organizes yearly the Text Retrieval
Conference (TREC), which has had a QA track for
the past 5 years from TREC-8 in 1999 to TREC-12
in 2003.
The document set
Newswire textual documents from LA Times, San
Jose Mercury News, Wall Street Journal, NY Times
etcetera over 1M documents now.
Well-formed lexically, syntactically and
semantically (were reviewed by professional
editors).
The questions
Hundreds of new questions every year, the total
is close to 2000 for all TRECs.
Task
Initially extract at most 5 answers long (250B)
and short (50B).
Now extract only one exact answer.
Several other sub-tasks added later definition,
list, context.
Metrics
Mean Reciprocal Rank (MRR) each question
assigned the reciprocal rank of the first correct
answer. If correct answer at position k, the
score is 1/k.

8
Overview

What is Question Answering?
A traditional system
SMU ranked first at TREC-8 and TREC-9
The foundation of LCCs PowerAnswer system
(http//www.languagecomputer.com)
Other relevant approaches
Distributed Question Answering

9
QA Block Architecture
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
10
Question Processing Flow
Question semantic representation
Construction of the question representation
Q
Question parsing
Answer type detection
AT category
Keyword selection
Keywords
11
Lexical Terms Examples

Questions approximated by sets of unrelated words
(lexical terms)
Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms
Q002 What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize
Q003 What does the Peugeot company manufacture? Peugeot, company, manufacture
Q004 How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993
Q005 What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer
12
Question Stems and Answer Type Examples

Identify the semantic category of expected answers

Question Question stem Answer type
Q555 What was the name of Titanics captain? What Person
Q654 What U.S. Government agency registers trademarks? What Organization
Q162 What is the capital of Kosovo? What City
Q661 How much does one ton of cement cost? How much Quantity

Other question stems Who, Which, Name, How
hot...
Other answer types Country, Number, Product...

13
Building the Question Representation
from the question parse tree, bottom-up traversal
with a set of propagation rules
Q006 Why did David Koresh ask the FBI for a word
processor?
SBARQ
SQ

VP

PP
WHADVP NP
NP NP
WRB VBD NNP NNP VB DT NNP
IN DT NN NN
Why did David Koresh ask the
FBI for a word processor
published in COLING 2000

- assign labels to non-skip leaf nodes
propagate label of head child node, to parent
node
link head child node to other children nodes

14
Building the Question Representation
from the question parse tree, bottom-up traversal
with a set of propagation rules
Q006 Why did David Koresh ask the FBI for a word
processor?
SBARQ
SQ

VP

PP
WHADVP NP
NP NP
WRB VBD NNP NNP VB DT NNP
IN DT NN NN
Why did David Koresh ask the
FBI for a word processor
Koresh
FBI
ask
Question representation
David
REASON
processor
word
15
Detecting the Expected Answer Type

In some cases, the question stem is sufficient to
indicate the answer type (AT)
Why ? REASON
When ? DATE
In many cases, the question stem is ambiguous
Examples
What was the name of Titanics captain ?
What U.S. Government agency registers trademarks?
What is the capital of Kosovo?
Solution select additional question concepts (AT
words) that help disambiguate the expected answer
type
Examples
captain
agency
capital

16
AT Detection Algorithm

Select the answer type word from the question
representation.
Select the word(s) connected to the question.
Some content-free words are skipped (e.g.
name).
From the previous set select the word with the
highest connectivity in the question
representation.
Map the AT word in a previously built AT
hierarchy
The AT hierarchy is based on WordNet, with some
concepts associated with semantic categories,
e.g. writer ? PERSON.
Select the AT(s) from the first hypernym(s)
associated with a semantic category.

17
Answer Type Hierarchy
PERSON
PERSON
18
Evaluation of Answer Type Hierarchy

Controlled variation of the number of WordNet
synsets included in answer type hierarchy.
Test on 800 TREC questions.

Precision score (50-byte answers)
Hierarchy coverage
0 0.296 3
0.404 10
0.437 25
0.451 50 0.461

The derivation of the answer type is the main
source of unrecoverable errors in the QA system

19
Keyword Selection

AT indicates what the question is looking for,
but provides insufficient context to locate the
answer in very large document collection
Lexical terms (keywords) from the question,
possibly expanded with lexical/semantic
variations provide the required context

20
Keyword Selection Algorithm

Select all non-stop words in quotations
Select all NNP words in recognized named entities
Select all complex nominals with their adjectival
modifiers
Select all other complex nominals
Select all nouns with adjectival modifiers
Select all other nouns
Select all verbs
Select the AT word (which was skipped in all
previous steps)

21
Keyword Selection Examples

What researcher discovered the vaccine against
Hepatitis-B?
Hepatitis-B, vaccine, discover, researcher
What is the name of the French oceanographer who
owned Calypso?
Calypso, French, own, oceanographer
What U.S. government agency registers trademarks?
U.S., government, trademarks, register, agency
What is the capital of Kosovo?
Kosovo, capital

22
Passage Retrieval
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
23
Passage Retrieval Architecture
Passage Quality
Keywords
Yes
Keyword Adjustment
Passage Scoring
Passage Ordering
No
Passages
Ranked Passages
Passage Extraction
Documents
Document Retrieval
24
Passage Extraction Loop

Passage Extraction Component
Extracts passages that contain all selected
keywords
Passage size dynamic
Start position dynamic
Passage quality and keyword adjustment
In the first iteration use the first 6 keyword
selection heuristics
If the number of passages is lower than a
threshold ? query is too strict ? drop a keyword
If the number of passages is higher than a
threshold ? query is too relaxed ? add a keyword

25
Passage Scoring (1/2)

Passages are scored based on keyword windows
For example, if a question has a set of keywords
k1, k2, k3, k4, and in a passage k1 and k2 are
matched twice, k3 is matched once, and k4 is not
matched, the following windows are built

Window 1
Window 2
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
Window 3
Window 4
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
26
Passage Scoring (2/2)

Passage ordering is performed using a radix sort
that involves three scores largest
SameWordSequenceScore, largest DistanceScore,
smallest MissingKeywordScore.
SameWordSequenceScore
Computes the number of words from the question
that are recognized in the same sequence in the
window
DistanceScore
The number of words that separate the most
distant keywords in the window
MissingKeywordScore
The number of unmatched keywords in the window

27
Answer Extraction
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
28
Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.

Answer type Person
Text passage Among them was Christa McAuliffe,
the first private citizen to fly in space. Karen
Allen, best known for her starring role in
Raiders of the Lost Ark, plays McAuliffe. Brian
Kerwin is featured as shuttle pilot Mike
Smith...
Best candidate answer Christa McAuliffe

29
Features for Answer Ranking

relNMW number of question terms matched in the
answer passage
relSP number of question terms matched in the
same phrase as the candidate answer
relSS number of question terms matched in the
same sentence as the candidate answer
relFP flag set to 1 if the candidate answer is
followed by a punctuation sign
relOCTW number of question terms matched,
separated from the candidate answer by at most
three words and one comma
relSWS number of terms occurring in the same
order in the answer passage as in the question
relDTW average distance from candidate answer
to question term matches

Robust heuristics that work on unrestricted text!
30
Answer Ranking based on Machine Learning

Relative relevance score computed for each pair
of candidates (answer windows)
relPAIR wSWS ? ?relSWS wFP ? ?relFP
wOCTW ? ?relOCTW wSP ? ?relSP wSS
? ?relSS
wNMW ? ?relNMW wDTW ? ?relDTW
threshold
if relPAIR positive, then first candidate from
pair is more relevant
Perceptron model used to learn the weights
published in SIGIR 2001
Scores in the 50 MRR for short answers, in the
60 MRR for long answers

31
Evaluation on the Web

test on 350 questions from TREC (Q250-Q600)
extract 250-byte answers

Google Answer extraction from Google AltaVista Answer extraction from AltaVista
Precision score 0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45
32
System ExtensionAnswer Justification

Experiments with Open-Domain Textual Question
Answering. Sanda Harabagiu, Marius Pasca and
Steve Maiorano.
Answer justification using unnamed relations
extracted from the question representation and
the answer representation (constructed through a
similar process).

33
System ExtensionDefinition Questions

Definition questions ask about the definition or
description of a concept
Who is John Galt?
What is anorexia nervosa?
Many information nuggets are acceptable answers
Who is George W. Bush?
George W. Bush, the 43rd President of the
United States
George W. Bush defeated Democratic incumbentAnn
Richards to become the 46th Governor of the State
of Texas
Scoring
Any information nugget is acceptable
Precision score over all information nuggets

34
Answer Detection with Pattern Matching

For Definition questions

Q386 What is anorexia nervosa? cause of anorexia nervosa, an eating disorder...
Q358 What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340 Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
35
Answer Detection with Concept Expansion

Enhancement for Definition questions
Identify terms that are semantically related to
the phrase to define
WordNet hypernyms (more general concepts)

Question WordNet hypernym Detected answer candidate
What is a shaman? priest, non-Christian priest Mathews is the priest or shaman
What is a nematode? worm nematodes, tiny worms in soil
What is anise? herb, herbaceous plant anise, rhubarb and other herbs
published in AAAI Spring Symposium 2002
36
Evaluation on Definition Questions

Determine the impact of answer type detection
with pattern matching and concept expansion
test on the Definition questions from TREC-9 and
TREC-10 (approx. 200 questions)
extract 50-byte answers
Results
precision score 0.56
questions with a correct answer among top 5
returned answers 0.67

37
References

Marius Pasca. High-Performance, Open-Domain
Question Answering from Large Text Collections,
Ph.D. Thesis, Computer Science and Engineering
Department, Southern Methodist University,
Defended September 2001, Dallas, Texas
Marius Pasca. Open-Domain Question Answering from
Large Text Collections, Center for the Study of
Language and Information (CSLI Publications,
series Studies in Computational Linguistics),
Stanford, California, Distributed by the
University of Chicago Press, ISBN (Paperback)
1575864282, ISBN (Cloth) 1575864274. 2003

38
Overview

What is Question Answering?
A traditional system
Other relevant approaches
LCCs PowerAnswer COGEX
IBMs PIQUANT
CMUs Javelin
ISIs TextMap
BBNs AQUA
Distributed Question Answering

39
PowerAnswer COGEX (1/2)

Automated reasoning for QA A ? Q, using a logic
prover. Facilititates both answer validation and
answer extraction.
Both question and answer(s) transformed in logic
forms. Example
Heavy selling of Standard Poors 500-stock
index futures in Chicago relentlessly beat stocks
downwards.
Heavy_JJ(x1) selling_NN(x1) of_IN(x1,x6)
Standard_NN(x2) _CC(x13,x2,x3) Poor(x3)
s_POS(x6,x13) 500-stock_JJ(x6) index_NN(x4)
futures(x5) nn_NNC(x6,x4,x5) in_IN(x1,x8)
Chicago_NNP(x8) relentlessly_RB(e12)
beat_VB(e12,x1,x9) stocks_NN(x9)
downward_RB(e12)

40
PowerAnswer COGEX (2/2)

World knowledge from
WordNet glosses converted to logic forms in the
eXtended WordNet (XWN) project (http//www.utdalla
s.edu/moldovan)
Lexical chains
gamen3 ? HYPERNYM ? recreationn1 ? HYPONYM ?
sportn1
Argentinea1 ? GLOSS ? Argentinan1
NLP axioms to handle complex NPs, coordinations,
appositions, equivalence classes for prepositions
etcetera
Named-entity recognizer
John Galt ? HUMAN
A relaxation mechanism is used to iteratively
uncouple predicates, remove terms from LFs. The
proofs are penalized based on the amount of
relaxation involved.

41
IBMs Piquant

Question processing conceptually similar to SMU,
but a series of different strategies (agents)
available for answer extraction. For each
question type, multiple agents might run in
parallel.
Reasoning engine and general-purpose ontology
from Cyc used as sanity checker.
Answer resolution remaining answers are
normalized and a voting strategy is used to
select the correct (meaning most redundant)
answer.

42
Piquant QA Agents

Predictive annotation agent
Predictive annotation the technique of
indexing named entities and other NL constructs
along with lexical terms. Lemur has built-in
support for this now.
General-purpose agent, used for almost all
question types.
Statistical Query Agent
Derivation from a probabilistic IR model, also
developed at IBM.
Also general-purpose.
Description Query
Generic descriptions appositions, parenthetical
expressions.
Applied mostly to definition questions.
Structured Knowledge Agent
Answers from WordNet/Cyc.
Applied whenever possible.
Pattern-Based Agent
Looks for specific syntactic patterns based on
the question form.
Applied when the answer is expected in a
well-structured form.
Dossier Agent
For Who is X? questions.
A dynamic set of factual questions used to learn
information nuggets about persons.

43
Pattern-based Agent

Motivation some questions (with or without AT)
indicate that the answer might be in a structured
form
What does Knight Rider publish? ? transitive
verb, missing object.
Knight Rider publishes X.
Patterns generated
From a static pattern repository, e.g. birth and
death dates recognition.
Dynamically from the question structure.
Matching of the expected answer pattern with the
actual answer text is not at word level, but at a
higher linguistic level based on full parse trees
(see IE lecture).

44
Dossier Agent

Addresses Who is X? questions.
Generates initially a series of generic
questions
When was X born?
What was Xs profession?
Future iterations dynamically decided based on
the previous answers?
If Xs profession is writer the next question
is What did X write?
A static ontology of biographical questions used.

45
CyC Sanity Checker

Post-processing component that
Rejects insane answers
How much does a grey wolf weigh?
300 tons
A grey wold IS-A wolf. Weight of a wolf known in
Cyc.
Cyc returns SANE, INSANE, or DONT KNOW.
Boosts answer confidence when the answer is SANE.
Typically called for numerical answer types
What is the population of Maryland?
How much does a grey wolf weigh?
How high is Mt. Hood?

46
Answer Resolution

Called when multiple agents are applied for the
same question. Distribution of agents the
predictive-annotation and the statistical agent
by far the most common.
Each agent provides a canonical answer (e.g.
normalized named entity) and a confidence score.
Final confidence for each candidate answer
computed using a ML model with SVM.

47
CMUs Javelin

Architecture combines SMUs and IBMs approaches.
Question processing close to SMUs approach.
Passage retrieval loop conceptually similar to
SMUs, but an elegant implementation.
Multiple answer strategies similar to IBMs
system. All of them are based on ML models (K
nearest neighbours, decision trees) that use
shallow-text features (close to SMUs).
Answer voting, similar to IBMs, used to exploit
answer redundancy.

48
Javelins Retrieval Strategist

Implements passage retrieval, including the
passage retrieval loop.
Uses the Inquiry IR system, probably Lemur by
now.
The retrieval loop uses all keywords in close
proximity of each other initially (stricter than
SMU).
Subsequent iterations relax the following query
terms
Proximity for all question keywords 20, 100,
250, AND
Phrase proximity for phrase operators less than
3 words or PHRASE
Phrase proximity for named entities less than 3
words or PHRASE
Inclusion/exclusion of AT word
Accuracy for TREC-11 queries how many questions
had at least one correct document in the top N
documents
Top 30 docs 80
Top 60 docs 85
Top 120 docs 86

49
ISIs TextMap Pattern-Based QA

Examples
Who invented the cotton gin?
ltwhogt invented the cotton gin
ltwhogt's invention of the cotton gin
ltwhogt received a patent for the cotton gin
How did Mahatma Gandhi die?
Mahatma Gandhi died lthowgt
Mahatma Gandhi drowned
ltwhogt assassinated Mahatma Gandhi
Patterns generated from the question form
(similar to IBM), learned using a pattern
discovery mechanism, or added manually to a
pattern repository
The pattern discovery mechanism performs a series
of generalizations from annotated examples
Babe Ruth was born in Baltimore, on February 6,
1895.
PERSON was born g in DATE

50
TextMap QA ? Machine Translation

In machine translation, one collects translations
pairs (s, d) and learns a model how to transform
the source s into the destination d.
QA is redefined in a similar way collect
question-answer pairs (a, q) and learn a model
that computes the probability that a question is
generated from the given answer p(q
parsetree(a)). The correct answer maximizes this
probability.
Only the subsets of answer parse trees where the
answer lies are used as training (not the whole
sentence).
An off-the-shelf machine translation package
(Giza) used to train the model.

51
TextMapExploiting the Data Redundancy

Additional knowledge resources are used whenever
applicable
WordNet glosses
What is a meerkat?
www.acronymfinder.com
What is ARDA?
Etcetera
The known answers are then simply searched in
the document collection together with question
keywords
Google is used for answer redundancy
TREC and Web (through Google) are searched in
parallel.
Final answer selected using a maximum entropy ML
model.
IBM introduced redundancy for QA agents, ISI uses
data redundancy.

52
BBNs AQUA

Factual system converts both question and answer
to a semantic form (close to SMUs)
Machine learning used to measure the similarity
of the two representations.
Was ranked best at the TREC definition pilot
organized before TREC-12
Definition system conceptually close to SMUs
Had pronominal and nominal coreference resolution
Used a (probably) better parser (Charniak)
Post-ranking of candidate answers using a tf
idf model

53
Overview

What is Question Answering?
A traditional system
Other relevant approaches
Distributed Question Answering

54
Sequential Q/A Architecture
Keywords
Question
Question Processing
Accepted Paragraphs
Paragraphs
Paragraph Retrieval
Paragraph Scoring
Paragraph Ordering
Answer Processing
Answers
55
Sequential Architecture Analysis

Module timing analysis

Analysis conclusions
Performance bottleneck modules have
well-specified resource requirements ? fit for
DLB
Iterative tasks ? fit for partitioning
Reduced inter-module communication ? effective
module migration/partitioning

56
Inter-Question Parallelism (1)
Internet/DNS
Node 1
Node N
Question Dispatcher
Load Monitor
Question Dispatcher
Load Monitor

Q/A Task
Q/A Task
Local Interconnection Network
57
Inter-Question Parallelism (2)

Question dispatcher
Improves upon the DNS blind allocation
Allocates a new question to the processor p best
fit for the average question. Processor p
minimizes
Recovers from failed questions
Load monitor
Updates and broadcasts local load
Receives remote load information
Detects system configuration changes

58
Intra-Question Parallelism (1)
Paragraph Retrieval Dispatcher
Paragraph Merging
Paragraph Retrieval (1)
Paragraph Scoring (1)
Keywords
Paragraphs
Paragraph Retrieval (2)
Paragraph Scoring (2)
Question Processing

Question

Paragraph Retrieval (k)
Paragraph Scoring (k)
Load Monitor
59
Intra-Question Parallelism (2)
Answer Processing Dispatcher
Answer Merging
Answer Processing (1)
Accepted Paragraphs
Unranked Answers
Paragraphs

Paragraph Ordering
Answer Sorting
Answer Processing (2)
Answers

Answer Processing (n)
Load Monitor
60
Meta-Scheduling Algorithm

metaScheduler(task, loadFunction,
underloadCondition)
select all processors p with underloadCondition(p)
true
if none selected then select processor p with the
smallest value for loadFunction(p)
assign to each selected processor p an weight wp
based on its current load
assign to each selected processor p a fraction wp
of the global task

61
Migration Example
processors
time
QP
QP
PR
PR
PS
PS
PO
PO
AP
AP
P1
P2
Pn

62
Partitioning Example
processors
QP
time
PR1
PR2
PRn

PS1
PS2
PSn
PO
AP1
AP2
APn
P1
P2
Pn

63
Inter-Question ParallelismSystem Throughput
64
Intra-Question Parallelism
65
End
Gràcies!

Write a Comment

User Comments (0)