Title: Question Answering
1Question Answering
Marti Hearst November 14, 2005
2Question Answering
- Outline
- Introduction to QA
- A typical full-fledged QA system
- A very simple system, in response to this
- An intermediate approach
- Incorporating a reasoning system
- Machine Learning of mappings
- Other question types (e.g., biography,
definitions)
3A of Search Types
Spectrum
- What is the typical height of a giraffe?
- What are some good ideas for landscaping my
clients yard? - What are some promising untried treatments for
Raynauds disease?
4Beyond Document Retrieval
- Document Retrieval
- Users submit queries corresponding to their
information needs. - System returns (voluminous) list of full-length
documents. - It is the responsibility of the users to find
information of interest within the returned
documents. - Open-Domain Question Answering (QA)
- Users ask questions in natural language.
- What is the highest volcano in Europe?
- System returns list of short answers.
- Under Mount Etna, the highest volcano
in Europe, perches the fabulous town - A real use for NLP
5Questions and Answers
- What is the height of a typical giraffe?
- The result can be a simple answer, extracted from
existing web pages. - Can specify with keywords or a natural language
query - However, most web search engines are not set up
to handle questions properly. - Get different results using a question vs.
keywords
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10The Problem of Question Answering
Natural language question, not keyword queries
What is the nationality of Pope John Paul II?
stabilize the country with its help, the
Catholic hierarchy stoutly held out for
pluralism, in large part at the urging of
Polish-born Pope John Paul II. When the Pope
emphatically defended the Solidarity trade union
during a 1987 tour of the
Short text fragment, not URL list
11Question Answering from text
- With massive collections of full-text documents,
simply finding relevant documents is of limited
use we want answers - QA give the user a (short) answer to their
question, perhaps supported by evidence. - An alternative to standard IR
- The first problem area in IR where NLP is really
making a difference.
12People want to ask questions
Examples from AltaVista query log who invented
surf music? how to make stink bombs where are the
snowdens of yesteryear? which english translation
of the bible is used in official catholic
liturgies? how to do clayart how to copy psx how
tall is the sears tower? Examples from Excite
query log (12/1999) how can i find someone in
texas where can i find information on puritan
religion? what are the 7 wonders of the world how
can i eliminate stress What vacuum cleaner does
Consumers Guide recommend
13A Brief (Academic) History
- In some sense question answering is not a new
research area - Question answering systems can be found in many
areas of NLP research, including - Natural language database systems
- A lot of early NLP work on these
- Problem-solving systems
- STUDENT (Winograd 77)
- LUNAR (Woods Kaplan 77)
- Spoken dialog systems
- Currently very active and commercially relevant
- The focus is now on open-domain QA is new
- First modern system MURAX (Kupiec, SIGIR93)
- Trivial Pursuit questions
- Encyclopedia answers
- FAQFinder (Burke et al. 97)
- TREC QA competition (NIST, 1999present)
14AskJeeves
- AskJeeves is probably most hyped example of
Question answering - How it used to work
- Do pattern matching to match a question to their
own knowledge base of questions - If a match is found, returns a human-curated
answer to that known question - If that fails, it falls back to regular web
search - (Seems to be more of a meta-search engine now)
- A potentially interesting middle ground, but a
fairly weak shadow of real QA
15Question Answering at TREC
- Question answering competition at TREC consists
of answering a set of 500 fact-based questions,
e.g., - When was Mozart born?.
- Has really pushed the field forward.
- The document set
- Newswire textual documents from LA Times, San
Jose Mercury News, Wall Street Journal, NY Times
etcetera over 1M documents now. - Well-formed lexically, syntactically and
semantically (were reviewed by professional
editors). - The questions
- Hundreds of new questions every year, the total
is 2400 - Task
- Initially extract at most 5 answers long (250B)
and short (50B). - Now extract only one exact answer.
- Several other sub-tasks added later definition,
list, biography.
16Sample TREC questions
1. Who is the author of the book, "The Iron Lady
A Biography of Margaret Thatcher"? 2. What was
the monetary value of the Nobel Peace Prize in
1989? 3. What does the Peugeot company
manufacture? 4. How much did Mercury spend on
advertising in 1993? 5. What is the name of the
managing director of Apricot Computer? 6. Why
did David Koresh ask the FBI for a word
processor? 7. What is the name of the rare
neurological disease with symptoms such as
involuntary movements (tics), swearing, and
incoherent vocalizations (grunts, shouts, etc.)?
17TREC Scoring
- For the first three years systems were allowed to
return 5 ranked answer snippets (50/250 bytes) to
each question. - Mean Reciprocal Rank Scoring (MRR)
- Each question assigned the reciprocal rank of the
first correct answer. If correct answer at
position k, the score is 1/k. - 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6
position - Mainly Named Entity answers (person, place, date,
) - From 2002 on, the systems are only allowed to
return a single exact answer and the notion of
confidence has been introduced.
18Top Performing Systems
- In 2003, the best performing systems at TREC can
answer approximately 60-70 of the questions - Approaches and successes have varied a fair deal
- Knowledge-rich approaches, using a vast array of
NLP techniques stole the show in 2000-2003 - Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC
) - Statistical systems starting to catch up
- AskMSR system stressed how much could be achieved
by very simple methods with enough text (and now
various copycats) - People are experimenting with machine learning
methods - Middle ground is to use large collection of
surface matching patterns (ISI)
19Example QA System
- This system contains many components used by
other systems, but more complex in some ways - Most work completed in 2001 there have been
advances by this group and others since then. - Next slides based mainly on
- Pasca and Harabagiu, High-Performance Question
Answering from Large Text Collections, SIGIR01. - Pasca and Harabagiu, Answer Mining from Online
Documents, ACL01. - Harabagiu, Pasca, Maiorano Experiments with
Open-Domain Textual Question Answering. COLING00
20QA Block Architecture
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
21Question Processing Flow
Question semantic representation
Construction of the question representation
Q
Question parsing
Answer type detection
AT category
Keyword selection
Keywords
22Question Stems and Answer Types
Identify the semantic category of expected answers
Question Question stem Answer type
Q555 What was the name of Titanics captain? What Person
Q654 What U.S. Government agency registers trademarks? What Organization
Q162 What is the capital of Kosovo? What City
Q661 How much does one ton of cement cost? How much Quantity
- Other question stems Who, Which, Name, How
hot... - Other answer types Country, Number, Product...
23Detecting the Expected Answer Type
- In some cases, the question stem is sufficient to
indicate the answer type (AT) - Why ? REASON
- When ? DATE
- In many cases, the question stem is ambiguous
- Examples
- What was the name of Titanics captain ?
- What U.S. Government agency registers trademarks?
- What is the capital of Kosovo?
- Solution select additional question concepts (AT
words) that help disambiguate the expected answer
type - Examples
- captain
- agency
- capital
24Answer Type Taxonomy
- Encodes 8707 English concepts to help recognize
expected answer type - Mapping to parts of Wordnet done by hand
- Can connect to Noun, Adj, and/or Verb
subhierarchies
25Answer Type Detection Algorithm
- Select the answer type word from the question
representation. - Select the word(s) connected to the question.
Some content-free words are skipped (e.g.
name). - From the previous set select the word with the
highest connectivity in the question
representation. - Map the AT word in a previously built AT
hierarchy - The AT hierarchy is based on WordNet, with some
concepts associated with semantic categories,
e.g. writer ? PERSON. - Select the AT(s) from the first hypernym(s)
associated with a semantic category.
26Answer Type Hierarchy
PERSON
PERSON
27Evaluation of Answer Type Hierarchy
- This evaluation done in 2001
- Controlled the variation of the number of WordNet
synsets included in the answer type hierarchy. - Test on 800 TREC questions.
Precision score (50-byte answers)
Hierarchy coverage
0 0.296
3 0.404 10
0.437 25
0.451 50
0.461
- The derivation of the answer type is the main
source of unrecoverable errors in the QA system
28Keyword Selection
- Answer Type indicates what the question is
looking for, but provides insufficient context to
locate the answer in very large document
collection - Lexical terms (keywords) from the question,
possibly expanded with lexical/semantic
variations provide the required context.
29Lexical Term Extraction
- Questions approximated by sets of unrelated words
(lexical terms) - Similar to bag-of-word IR models
Question (from TREC QA track) Lexical terms
Q002 What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize
Q003 What does the Peugeot company manufacture? Peugeot, company, manufacture
Q004 How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993
Q005 What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer
30Keyword Selection Algorithm
- Select all non-stopwords in quotations
- Select all NNP words in recognized named entities
- Select all complex nominals with their adjectival
modifiers - Select all other complex nominals
- Select all nouns with adjectival modifiers
- Select all other nouns
- Select all verbs
- Select the AT word (which was skipped in all
previous steps)
31Keyword Selection Examples
- What researcher discovered the vaccine against
Hepatitis-B? - Hepatitis-B, vaccine, discover, researcher
- What is the name of the French oceanographer who
owned Calypso? - Calypso, French, own, oceanographer
- What U.S. government agency registers trademarks?
- U.S., government, trademarks, register, agency
- What is the capital of Kosovo?
- Kosovo, capital
32Passage Retrieval
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
33Passage Extraction Loop
- Passage Extraction Component
- Extracts passages that contain all selected
keywords - Passage size dynamic
- Start position dynamic
- Passage quality and keyword adjustment
- In the first iteration use the first 6 keyword
selection heuristics - If the number of passages is lower than a
threshold ? query is too strict ? drop a keyword - If the number of passages is higher than a
threshold ? query is too relaxed ? add a keyword
34Passage Retrieval Architecture
Passage Quality
Keywords
Yes
Keyword Adjustment
Passage Scoring
Passage Ordering
No
Passages
Ranked Passages
Passage Extraction
Documents
Document Retrieval
35Passage Scoring
- Passages are scored based on keyword windows
- For example, if a question has a set of keywords
k1, k2, k3, k4, and in a passage k1 and k2 are
matched twice, k3 is matched once, and k4 is not
matched, the following windows are built
Window 1
Window 2
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
Window 3
Window 4
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
36Passage Scoring
- Passage ordering is performed using a radix sort
that involves three scores - SameWordSequenceScore (largest)
- Computes the number of words from the question
that are recognized in the same sequence in the
window - DistanceScore (largest)
- The number of words that separate the most
distant keywords in the window - MissingKeywordScore (smallest)
- The number of unmatched keywords in the window
37Answer Extraction
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
38Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.
- Answer type Person
- Text passage
- Among them was Christa McAuliffe, the first
private citizen to fly in space. Karen Allen,
best known for her starring role in Raiders of
the Lost Ark, plays McAuliffe. Brian Kerwin is
featured as shuttle pilot Mike Smith... - Best candidate answer Christa McAuliffe
39Features for Answer Ranking
- relNMW number of question terms matched in the
answer passage - relSP number of question terms matched in the
same phrase as the candidate answer - relSS number of question terms matched in the
same sentence as the candidate answer - relFP flag set to 1 if the candidate answer is
followed by a punctuation sign - relOCTW number of question terms matched,
separated from the candidate answer by at
most three words and one comma - relSWS number of terms occurring in the same
order in the answer passage as in the
question - relDTW average distance from candidate answer to
question term matches
SIGIR 01
40Answer Ranking based on Machine Learning
- Relative relevance score computed for each pair
of candidates (answer windows) - relPAIR wSWS ? ?relSWS wFP ? ?relFP
- wOCTW ? ?relOCTW wSP ? ?relSP wSS
? ?relSS - wNMW ? ?relNMW wDTW ? ?relDTW
threshold - If relPAIR positive, then first candidate from
pair is more relevant - Perceptron model used to learn the weights
- Scores in the 50 MRR for short answers, in the
60 MRR for long answers
41Evaluation on the Web
- test on 350 questions from TREC (Q250-Q600)
- extract 250-byte answers
Google Answer extraction from Google AltaVista Answer extraction from AltaVista
Precision score 0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45
42Can we make this simpler?
- One reason systems became so complex is that they
have to pick out one sentence within a small
collection - The answer is likely to be stated in a
hard-to-recognize manner. - Alternative Idea
- What happens with a much larger collection?
- The web is so huge that youre likely to see the
answer stated in a form similar to the question - Goal make the simplest possible QA system by
exploiting this redundancy in the web - Use this as a baseline against which to compare
more elaborate systems. - The next slides based on
- Web Question Answering Is More Always Better?
Dumais, Banko, Brill, Lin, Ng, SIGIR02 - An Analysis of the AskMSR Question-Answering
System, Brill, Dumais, and Banko, EMNLP02.
43AskMSR System Architecture
2
1
3
5
4
44Step 1 Rewrite the questions
- Intuition The users question is often
syntactically quite close to sentences that
contain the answer - Where is the Louvre Museum located?
- The Louvre Museum is located in Paris
- Who created the character of Scrooge?
- Charles Dickens created the character of Scrooge.
45Query rewriting
- Classify question into seven categories
- Who is/was/are/were?
- When is/did/will/are/were ?
- Where is/are/were ?
- a. Hand-crafted category-specific transformation
rules - e.g. For where questions, move is to all
possible locations - Look to the right of the query terms for the
answer. - Where is the Louvre Museum located?
- ? is the Louvre Museum located
- ? the is Louvre Museum located
- ? the Louvre is Museum located
- ? the Louvre Museum is located
- ? the Louvre Museum located is
- b. Expected answer Datatype (eg, Date, Person,
Location, ) - When was the French Revolution? ? DATE
Nonsense,but ok. Its only a fewmore
queriesto the search engine.
46Query Rewriting - weighting
- Some query rewrites are more reliable than
others.
Where is the Louvre Museum located?
Weight 5if a match,probably right
Weight 1 Lots of non-answerscould come back too
the Louvre Museum is located
Louvre Museum located
47Step 2 Query search engine
- Send all rewrites to a Web search engine
- Retrieve top N answers (100-200)
- For speed, rely just on search engines
snippets, not the full text of the actual
document
48Step 3 Gathering N-Grams
- Enumerate all N-grams (N1,2,3) in all retrieved
snippets - Weight of an n-gram occurrence count, each
weighted by reliability (weight) of rewrite
rule that fetched the document - Example Who created the character of Scrooge?
- Dickens 117
- Christmas Carol 78
- Charles Dickens 75
- Disney 72
- Carl Banks 54
- A Christmas 41
- Christmas Carol 45
- Uncle 31
49Step 4 Filtering N-Grams
- Each question type is associated with one or more
data-type filters regular expression - When
- Where
- What
- Who
- Boost score of n-grams that match regexp
- Lower score of n-grams that dont match regexp
- Details omitted from paper.
Date
Location
Person
50Step 5 Tiling the Answers
Scores 20 15 10
merged, discard old n-grams
Charles Dickens
Dickens
Mr Charles
Mr Charles Dickens
Score 45
N-Grams
N-Grams
tile highest-scoring n-gram
Repeat, until no more overlap
51Results
- Standard TREC contest test-bed (TREC 2001) 1M
documents 900 questions - Technique doesnt do too well (though would have
placed in top 9 of 30 participants) - MRR strict .34
- MRR lenient .43
- 9th place
52Results
- From EMNLP02 paper
- MMR of .577 answers 61 correctly
- Would be near the top of TREC-9 runs
- Breakdown of feature contribution
53Issues
- Works best/only for Trivial Pursuit-style
fact-based questions - Limited/brittle repertoire of
- question categories
- answer data types/filters
- query rewriting rules
54Intermediate ApproachSurface pattern discovery
- Based on
- Ravichandran, D. and Hovy E.H. Learning Surface
Text Patterns for a Question Answering System,
ACL02 - Hovy, et al., Question Answering in Webclopedia,
TREC-9, 2000. - Use of Characteristic Phrases
- "When was ltpersongt born
- Typical answers
- "Mozart was born in 1756.
- "Gandhi (1869-1948)...
- Suggests regular expressions to help locate
correct answer - "ltNAMEgt was born in ltBIRTHDATEgt
- "ltNAMEgt ( ltBIRTHDATEgt-
55Use Pattern Learning
- Examples
- The great composer Mozart (1756-1791) achieved
fame at a young age - Mozart (1756-1791) was a genius
- The whole world would always be indebted to the
great music of Mozart (1756-1791) - Longest matching substring for all 3 sentences is
"Mozart (1756-1791) - Suffix tree would extract "Mozart (1756-1791)" as
an output, with score of 3 - Reminiscent of IE pattern learning
56Pattern Learning (cont.)
- Repeat with different examples of same question
type - Gandhi 1869, Newton 1642, etc.
- Some patterns learned for BIRTHDATE
- a. born in ltANSWERgt, ltNAMEgt
- b. ltNAMEgt was born on ltANSWERgt ,
- c. ltNAMEgt ( ltANSWERgt -
- d. ltNAMEgt ( ltANSWERgt - )
57QA Typology from ISI
- Typology of typical question forms94 nodes (47
leaf nodes) - Analyzed 17,384 questions (from answers.com)
58Experiments
- 6 different question types
- from Webclopedia QA Typology
- BIRTHDATE
- LOCATION
- INVENTOR
- DISCOVERER
- DEFINITION
- WHY-FAMOUS
59Experiments pattern precision
- BIRTHDATE
- 1.0 ltNAMEgt ( ltANSWERgt - )
- 0.85 ltNAMEgt was born on ltANSWERgt,
- 0.6 ltNAMEgt was born in ltANSWERgt
- 0.59 ltNAMEgt was born ltANSWERgt
- 0.53 ltANSWERgt ltNAMEgt was born
- 0.50 - ltNAMEgt ( ltANSWERgt
- 0.36 ltNAMEgt ( ltANSWERgt -
- INVENTOR
- 1.0 ltANSWERgt invents ltNAMEgt
- 1.0 the ltNAMEgt was invented by ltANSWERgt
- 1.0 ltANSWERgt invented the ltNAMEgt in
60Experiments (cont.)
- DISCOVERER
- 1.0 when ltANSWERgt discovered ltNAMEgt
- 1.0 ltANSWERgt's discovery of ltNAMEgt
- 0.9 ltNAMEgt was discovered by ltANSWERgt in
- DEFINITION
- 1.0 ltNAMEgt and related ltANSWERgt
- 1.0 form of ltANSWERgt, ltNAMEgt
- 0.94 as ltNAMEgt, ltANSWERgt and
61Experiments (cont.)
- WHY-FAMOUS
- 1.0 ltANSWERgt ltNAMEgt called
- 1.0 laureate ltANSWERgt ltNAMEgt
- 0.71 ltNAMEgt is the ltANSWERgt of
- LOCATION
- 1.0 ltANSWERgt's ltNAMEgt
- 1.0 regional ltANSWERgt ltNAMEgt
- 0.92 near ltNAMEgt in ltANSWERgt
- Depending on question type, get high MRR
(0.60.9), with higher results from use of Web
than TREC QA collection
62Shortcomings Extensions
- Need for POS /or semantic types
- "Where are the Rocky Mountains?
- "Denver's new airport, topped with white
fiberglass cones in imitation of the Rocky
Mountains in the background , continues to lie
empty - ltNAMEgt in ltANSWERgt
- NE tagger /or ontology could enable system to
determine "background" is not a location
63Shortcomings... (cont.)
- Long distance dependencies
- "Where is London?
- "London, which has one of the busiest airports
in the world, lies on the banks of the river
Thames - would require pattern likeltQUESTIONgt,
(ltany_wordgt), lies on ltANSWERgt - Abundance variety of Web data helps system to
find an instance of patterns w/o losing answers
to long distance dependencies
64Shortcomings... (cont.)
- System currently has only one anchor word
- Doesn't work for Q types requiring multiple words
from question to be in answer - "In which county does the city of Long Beach
lie? - "Long Beach is situated in Los Angeles County
- required pattern ltQ_TERM_1gt is situated in
ltANSWERgt ltQ_TERM_2gt - Does not use case
- "What is a micron?
- "...a spokesman for Micron, a maker of
semiconductors, said SIMMs are..." - If Micron had been capitalized in question, would
be a perfect answer
65The Importance of NER
- The results of the past 5 TREC evaluations of QA
systems indicate that current state-of-the-art QA
is determined by the recognition of Named
Entities - In TREC 2003 the LCC QA system extracted 289
correct answers for factoid questions - The Name Entity Recognizer was responsible for
234 of them
QUANTITY 55 ORGANIZATION 15 PRICE 3
NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2
DATE 35 PRODUCT 11 ACRONYM 1
PERSON 31 CONTINENT 5 ADDRESS 1
COUNTRY 21 PROVINCE 5 ALPHABET 1
OTHER LOCATIONS 19 QUOTE 5 URI 1
CITY 19 UNIVERSITY 3
66The Special Case of Names
Questions asking for names of authored works
1934 What is the play West Side Story based on? Answer Romeo and Juliet
1976 What is the motto for the Boy Scouts? Answer Be prepared.
1982 What movie won the Academy Award for best picture in 1989? Answer Driving Miss Daisy
2080 What peace treaty ended WWI? Answer Versailles
2102 What American landmark stands on Liberty Island? Answer Statue of Liberty
67Problems
- NE assumes all answers are named entities
- Oversimplifies the generative power of language!
- What about What kind of flowers did Van Gogh
paint? - Does not account well for morphological, lexical,
and semantic alternations - Question terms may not exactly match answer
terms connections between alternations of Q and
A terms often not documented in flat dictionary - Example When was Berlins Brandenburger Tor
erected? ? no guarantee to match built - Recall suffers
68LCC ApproachWordNet to the rescue!
- WordNet can be used to inform all three steps of
the Q/A process - 1. Answer-type recognition (Answer Type
Taxonomy) - 2. Passage Retrieval (specificity
constraints) - 3. Answer extraction (recognition of keyword
alternations) - Using WNs lexico-semantic info Examples
- What kind of flowers did Van Gogh paint?
- Answer-type recognition need to know (a) answer
is a kind of flower, and (b) sense of the word
flower - WordNet encodes 470 hyponyms of flower sense 1,
flowers as plants - Nouns from retrieved passages can be searched
against these hyponyms - When was Berlins Brandenburger Tor erected?
- Semantic alternation erect is a hyponym of
sense 1 of build
69WN for Answer Type Recognition
- Encodes 8707 English concepts to help recognize
expected answer type - Mapping to parts of Wordnet done by hand
- Can connect to Noun, Adj, and/or Verb
subhierarchies
70WN in Passage Retrieval
- Identify relevant passages from text
- Extract keywords from the question, and
- Pass them to the retrieval module
- Specificity filtering question
concepts/keywords - Focuses search, improves performance and
precision - Question keywords can be omitted from the search
if they are too general - Specificity calculated by counting the hyponyms
of a given keyword in WordNet - Count ignores proper names and same-headed
concepts - Keyword is thrown out if count is above a given
threshold (currently 10)
71WN in Answer Extraction
- If keywords alone cannot find an acceptable
answer, look for alternations in WordNet!
Q196 Who wrote Hamlet?
Morphological Alternation wrote ? written
Answer before the young playwright has written Hamlet and Shakespeare seizes the opportunity
Q136 Who is the queen of Holland?
Lexical Alternation Holland ? Netherlands
Answer Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present
Q196 What is the highest mountain in the world?
Semantic Alternation mountain ? peak
Answer first African country to send an expedition to Mount Everest, the worlds highest peak
72Evaluation
- Pasca/Harabagiu (NAACL01 Workshop) measured
approach using TREC-8 and TREC-9 test collections - WN contributions to Answer Type Recognition
- Count number of questions for which acceptable
answers were found 3GB text collection, 893
questions
Method questions with correct answer type
All What only
Flat dictionary (baseline) 227 (32) 48 (13)
A-type taxonomy (static) 445 (64) 179 (50)
A-type taxonomy (dynamic) 463 (67) 196 (56)
A-type taxonomy (dynamic answer patterns) 533 (76) 232 (65)
73Evaluation
- WN contributions to Passage Retrieval
- Impact of keyword alternations
- Impact of specificity knowledge
No alternations enabled 55.3 precision
Lexical alternations enabled 67.6
Lexical semantic alternations enabled 73.7
Morphological expansions enabled 76.5
Specificity knowledge questions with correct answer in first 5 documents returned
TREC-8 TREC-9
Not included 133 (65) 463 (67)
Included 151 (76) 515 (74)
74Going Beyond Word Matching
- Use techniques from artificial intelligence to
try to draw inferences from the meanings of the
words - This is a highly unusual and ambitious approach.
- Surprising it works at all!
- Requires huge amounts of hand-coded information
- Uses notions of proofs and inference from logic
- All birds fly. Robins are birds. Thus, robins
fly. - forall(X) bird(X) -gt fly(x)
- forall(X,Y) student(X), enrolled(X,Y) -gt
school(Y)
75Inference via a Logic Prover
- The LCC system attempts inference to justify an
answer - Its inference engine is a kind of funny middle
ground between logic and pattern matching - But quite effective 30 improvement
- Q When was the internal combustion engine
invented? - A The first internal-combustion engine was built
in 1867. - invent -gt create_mentally -gt create -gt build
76COGEX
- World knowledge from
- WordNet glosses converted to logic forms in the
eXtended WordNet (XWN) project - Lexical chains
- gamen3 ? HYPERNYM ? recreationn1 ? HYPONYM ?
sportn1 - Argentinea1 ? GLOSS ? Argentinan1
- NLP axioms to handle complex NPs, coordinations,
appositions, equivalence classes for prepositions
etcetera - Named-entity recognizer
- John Galt ? HUMAN
- A relaxation mechanism is used to iteratively
uncouple predicates, remove terms from LFs. The
proofs are penalized based on the amount of
relaxation involved.
77Logic Inference Example
- How hot does the inside of an active volcano
get? - get(TEMPERATURE, inside(volcano(active)))
- lava fragments belched out of the mountain were
as hot as 300 degrees Fahrenheit - fragments(lava, TEMPERATURE(degrees(300)),
- belched(out, mountain))
- volcano ISA mountain
- lava ISPARTOF volcano -gt lava inside volcano
- fragments of lava HAVEPROPERTIESOF lava
- The needed semantic information is in WordNet
definitions, and was successfully translated into
a form that was used for rough proofs
78Axiom Creation
- XWN Axioms
- A major source of world knowledge is a general
purpose knowledge base of more than 50,000 parsed
and disambiguated WordNet glosses that are
transformed into logical form for use during the
course of a proof. - Gloss
- Kill is to cause to die
- Logical Form
- kill_VB_1(e1,x1,x2) -gt cause_VB_1(e1,x1,x3)
to_TO(e1,e2) die_VB_1(e2,x2,x4)
79Lexical Chains
- Lexical Chains
- Lexical chains provide an improved source of
world knowledge by supplying the Logic Prover
with much needed axioms to link question keywords
with answer concepts. - Question
- How were biological agents acquired by bin
Laden? - Answer
- On 8 July 1998 , the Italian newspaper Corriere
della Serra indicated that members of The World
Front for Fighting Jews and Crusaders , which was
founded by Bin Laden , purchased three chemical
and biological_agent production facilities in - Lexical Chain
- ( v - buy1, purchase1 ) HYPERNYM ( v - get1,
acquire1 )
80- Axiom Selection
- Lexical chains and the XWN knowledge base work
together to select and generate the axioms needed
for a successful proof when all the keywords in
the questions are not found in the answer. - Question
- How did Adolf Hitler die?
- Answer
- Adolf Hitler committed suicide
- The following Lexical Chain is detected
- ( n - suicide1, self-destruction1,
self-annihilation1 ) GLOSS ( v - kill1 ) GLOSS
( v - die1, decease1, perish1, go17, exit3,
pass_away1, expire2, pass25 ) 2 - The following axioms are loaded into the Prover
- exists x2 all e1 x1 (suicide_nn(x1) -gt
act_nn(x1) of_in(x1,e1) kill_vb(e1,x2,x2)). - exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2)
-gt cause_vb_2(e1,x1,x3) to_to(e1,e2)
die_vb(e2,x2,x4)). -
81LCC System Refecences
- The previous set of slides drew information from
these sources - The Informative Role of WordNet in Open-Domain
Question Answering, Pasca and Harabagiu, WordNet
and Other Lexical Resources, NAACL 2001 Workshop - Pasca and Harabagiu, High Performance
Question/Answering, SIGIR01 - Moldovan, Clark, Harabagiu, Maiorano COGEX A
Logic Prover for Question Answering. HLT-NAACL
2003 - Moldovan, Pasca, Harabagiu, and Surdeanu
Performance issues and error analysis in an
open-domain question answering system. ACM Trans.
Inf. Syst. 21(2) 133-154 (2003) - Harabagiu and Maiorano, Abductive Processes for
Answer Justification, AAAI Spring Symposium on
Mining Answers from Texts and Knowledge Bases,
2002 -
82Using Machine Learning in QA
- The following slides are based on
- Ramakrishnan, Chakrabarti, Paranjpe,
Bhattacharyya, Is Question Answering an Acquired
Skill? WWW04
83Learning Answer Type Mapping
- Idea use machine learning techniques to
automatically determine answer types and query
terms from questions. - Two types of answer types
- Surface patterns
- Infinite set, so cant be covered by a lexicon
- DATES NUMBERS PERSON NAMES LOCATIONS
- at DDDD in the DDs in DDDD Xx said
- Can also associate with synset daten7
- WordNet synsets
- Consider name an animal that sleeps upright
- Answer horse
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
84Determining Answer Types
- The hard ones are what and which questions.
- Two useful heuristics
- If the head of the NP appearing before the
auxiliary or main verb is not a wh-word, mark
this as an a-type clue - Otherwise, the head of the NP appearing after the
auxiliary/main verb is an atype clue.
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
85Learning Answer Types
- Given a QA pair (q, a)
- (name an animal that sleeps upright, horse)
- (1a) See which atype(s) horse can map to
- (1b) Look up the hypernyms of horse -gt S
- (2a) Record the k words to the right of the
q-word - (2b) For each of these k words, look up their
synsets - An, animal, that
- (2c) Increment the counts for those synsets that
also appear in S - Do significance testing
- Compare synset frequencies against a background
set - Retain only those that are significantly
associated with the question word more so than in
general (chi-square)
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
86Learning Answer Tyeps
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
87Learning to Choose Query Terms
- Which words from the question to use in the
query? - A tradeoff between precision and recall.
- Example
- Tokyo is the capital of which country?
- Want to use Tokyo verbatim
- Probably capital as well
- But maybe not country maybe nation or maybe
this word wont appear in the retrieved passage
at all. - Also, country corresponds to the answer type,
so probably we dont want to require it to be in
the answer text.
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
88Learning to Choose Query Terms
- Features
- POS assigned to word and immediate neighbors
- Starts with uppercase letter
- Is a stopword
- IDF score
- Is an answer-type for this question
- Ambiguity indicators
- of possible WordNet senses (NumSense)
- of other WordNet synsets that describe this
sense - E.g., for buck stag, deer, doe
- (NumLemma)
- Learner
- J48 decision tree worked best
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
89Learning to Choose Query Terms
- Results
- WordNet ambiguity indicators were very helpful
- Raised accuracy from 71-73 to 80
- Atype flag improved accuracy from 1-3
90Learning to Score Passages
- Given a question, and answer, a passage (q, a,
r) - Assign 1 if r contains a
- Assign 1 otherwise
- Features
- Do selected terms s from q appear in r?
- Does r have an answer zone a that does not s?
- Are the distances between tokens in a and s
small? - Does a have a strong WordNet similarity with qs
answer type? - Learner
- Use logistic regression, since it produces a
ranking rather than a hard classification into 1
or 1 - Produces a continuous estimate between 0 and 1
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
91Learning to Score Passages
- Results
- F-scores are low (.33 - .56)
- However, reranking greatly improves the rank of
the corresponding passages. - Eliminates many non-answers, pushing better
passages towards the top.
92Learning to Score Passages
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
93Computing WordNet Similarity
- Path-based similarity measures are not all that
good in WordNet - 3 hops from entity to artifact
- 3 hops from mammal to elephant
- An alternative
- Given a target synset t and an answer synset a
- Measure the overlap of nodes on the path
- from t to all noun roots and
- from a to all noun roots
- Algorithm for computing similarity of t to a
- If t is not a hypernym of a assign 0
- Else collect the set of hypernym synsets of t and
a - Call them Ht and Ha
- Compute the Jaccard overlap
- Ht Intersect Ha / Ht Union Ha
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
94Computing WordNet Similarity
- Algorithm for computing similarity of t to a
- Ht Intersect Ha / Ht Union Ha
entity
Ht Intersect Ha
object
living thing
Ht Union Ha
organism
Ht mammal, Ha elephant 7/10 .7 Ht animal,
Ha elephant 5/10 .5 Ht animal, Ha
mammal 4/7 .57 Ht mammal, Ha fox 7/11 .63
animal
chordate
vertebrate
mammal
placental mammal
proboscidean
elephant
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
95System Extension Definition Questions
- Definition questions ask about the definition or
description of a concept - Who is John Galt?
- What is anorexia nervosa?
- Many information nuggets are acceptable answers
- Who is George W. Bush?
- George W. Bush, the 43rd President of the
United States - George W. Bush defeated Democratic incumbentAnn
Richards to become the 46th Governor of the State
of Texas - Scoring
- Any information nugget is acceptable
- Precision score over all information nuggets
96Definition Detection with Pattern Matching
Q386 What is anorexia nervosa? cause of anorexia nervosa, an eating disorder...
Q358 What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340 Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
97Answer Detection with Concept Expansion
- Enhancement for Definition questions
- Identify terms that are semantically related to
the phrase to define - Use WordNet hypernyms (more general concepts)
Question WordNet hypernym Detected answer candidate
What is a shaman? priest, non-Christian priest Mathews is the priest or shaman
What is a nematode? worm nematodes, tiny worms in soil
What is anise? herb, herbaceous plant anise, rhubarb and other herbs
98Online QA Examples
- Examples (none work very well)
- AnswerBus
- http//www.answerbus.com
- Ionaut
- http//www.ionaut.com8400/
- LCC
- http//www.languagecomputer.com/demos/question_ans
wering/index.html