Title: Answering Questions by Computer
1Answering Questions by Computer
2Terminology Question Type
- Question Type an idiomatic categorization of
questions for purposes of distinguishing between
different processing strategies and/or answer
formats - E.g. TREC2003
- FACTOID How far is it from Earth to Mars?
- LIST List the names of chewing gums
- DEFINITION Who is Vlad the Impaler?
- Other possibilities
- RELATIONSHIP What is the connection between
Valentina Tereshkova and Sally Ride? - SUPERLATIVE What is the largest city on
Earth? - YES-NO Is Saddam Hussein alive?
- OPINION What do most Americans think of gun
control? - CAUSEEFFECT Why did Iraq invade Kuwait?
-
3Terminology Answer Type
- Answer Type the class of object (or rhetorical
type of sentence) sought by the question. E.g. - PERSON (from Who )
- PLACE (from Where )
- DATE (from When )
- NUMBER (from How many )
-
- but also
- EXPLANATION (from Why )
- METHOD (from How )
-
- Answer types are usually tied intimately to the
classes recognized by the systems Named Entity
Recognizer.
4Terminology Question Focus
- Question Focus The property or entity that is
being sought by the question. - E.g.
- In what state is the Grand Canyon?
- What is the population of Bulgaria?
- What colour is a pomegranate?
5Terminology Question Topic
- Question Topic the object (person, place, ) or
event that the question is about. The question
might well be about a property of the topic,
which will be the question focus. - E.g. What is the height of Mt. Everest?
- height is the focus
- Mt. Everest is the topic
6Terminology Candidate Passage
- Candidate Passage a text passage (anything from
a single sentence to a whole document) retrieved
by a search engine in response to a question. - Depending on the query and kind of index used,
there may or may not be a guarantee that a
candidate passage has any candidate answers. - Candidate passages will usually have associated
scores, from the search engine.
7Terminology Candidate Answer
- Candidate Answer in the context of a question,
a small quantity of text (anything from a single
word to a sentence or bigger, but usually a noun
phrase) that is of the same type as the Answer
Type. - In some systems, the type match may be
approximate, if there is the concept of
confusability. - Candidate answers are found in candidate passages
- E.g.
- 50
- Queen Elizabeth II
- September 8, 2003
- by baking a mixture of flour and water
8Terminology Authority List
- Authority List (or File) a collection of
instances of a class of interest, used to test a
term for class membership. - Instances should be derived from an authoritative
source and be as close to complete as possible. - Ideally, class is small, easily enumerated and
with members with a limited number of lexical
forms. - Good
- Days of week
- Planets
- Elements
- Good statistically, but difficult to get 100
recall - Animals
- Plants
- Colours
- Problematic
- People
- Organizations
- Impossible
- All numeric quantities
- Explanations and other clausal quantities
9Essence of Text-based QA
(Single source answers)
- Need to find a passage that answers the question.
- Find a candidate passage (search)
- Check that semantics of passage and question
match - Extract the answer
10Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.
- Answer type Person
- Text passage
- Among them was Christa McAuliffe, the first
private citizen to fly in space. Karen Allen,
best known for her starring role in Raiders of
the Lost Ark, plays McAuliffe. Brian Kerwin is
featured as shuttle pilot Mike Smith...
11Answer Extraction
- Also called Answer Selection/Pinpointing
- Given a question and candidate passages, the
process of selecting and ranking candidate
answers. - Usually, candidate answers are those terms in the
passages which have the same answer type as that
generated from the question - Ranking the candidate answers depends on
assessing how well the passage context relates to
the question - 3 Approaches
- Heuristic features
- Shallow parse fragments
- Logical proof
12Features for Answer Ranking
- Number of question terms matched in the answer
passage - Number of question terms matched in the same
phrase as the candidate answer - Number of question terms matched in the same
sentence as the candidate answer - Flag set to 1 if the candidate answer is followed
by a punctuation sign - Number of question terms matched, separated from
the candidate answer by at most three words
and one comma - Number of terms occurring in the same order in
the answer passage as in the question - Average distance from candidate answer to
question term matches
SIGIR 01
13Heuristics for Answer Ranking in the Lasso System
- Same_Word_Sequence_score number of words from
the question that are recognized in the same
sequence in the passage. - Punctuation_sign_score a flag set to 1 if the
candidate answer is followed by a punctuation
sign - Comma_3_word_score measure the number of
question words that follow the candidate, if the
candidate is followed by a coma. - Same_parse_subtree_score number of question
words found in the parse sub-tree of the answer - Same_sentence_score number of question words
found in the answers sentence. - Distance score adds the distance (measured in
number of words) between the answer candidate and
the other keywords in the window.
14Heuristics for Answer Ranking in the Lasso
Systemcontinued
15Evaluation
- Evaluation of this kind of system is usually
based on some kind of TREC-like metric. - In Q/A the most frequent metric is
- Mean reciprocal rank
- Youre allowed to return N answers. Your score is
based on 1/Rank of the first right answer. - Averaged over all the questions you answer.
16Answer Types and Modifiers
Name 5 French Cities
- Most likely there is no type for French Cities
- So will look for CITY
- include French/France in bag of words, and hope
for the best - include French/France in bag of words, retrieve
documents, and look for evidence (deep parsing,
logic) - use high-precision Language Identification on
results - If you have a list of French cities, could either
- Filter results by list
- Use Answer-Based QA (see later)
- Use longitude/latitude information of cities and
countries
17Answer Types and Modifiers
Name a female figure skater
- Most likely there is no type for female figure
skater - Most likely there is no type for figure skater
- Look for PERSON, with query terms figure,
skater - What to do about female? Two approaches.
- Include female in the bag-of-words.
- Relies on logic that if femaleness is an
interesting property, it might well be mentioned
in answer passages. - Does not apply to, say singer.
- Leave out female but test candidate answers for
gender. - Needs either an authority file or a heuristic
test. - Test may not be definitive.
18Part II - Specific Approaches
- By Genre
- Statistical QA
- Pattern-based QA
- Web-based QA
- Answer-based QA (TREC only)
- By System
- SMU
- LCC
- USC-ISI
- Insight
- Microsoft
- IBM Statistical
- IBM Rule-based
19Statistical QA
- Use statistical distributions to model
likelihoods of answer type and answer - E.g. IBM (Ittycheriah, 2001) see later section
20Pattern-based QA
- For a given question type, identify the typical
syntactic constructions used in text to express
answers to such questions - Typically very high precision, but a lot of work
to get decent recall
21Web-Based QA
- Exhaustive string transformations
- Brill et al. 2002
- Learning
- Radev et al. 2001
22Answer-Based QA
- Problem Sometimes it is very easy to find an
answer to a question using resource A, but the
task demands that you find it in resource B. - Solution First find the answer in resource A,
then locate the same answer, along with original
question terms, in resource B. - Artificial problem, but real for TREC
participants.
23Answer-Based QA
When a QA system looks for answers within a
relatively small textual collection, the chance
of finding strings/sentences that closely match
the question string is small. However, when a QA
system looks for strings/sentences that closely
match the question string on the web, the chance
of finding correct answer is much higher.
Hermjakob et al. 2002
- Why this is true
- The Web is much larger than the TREC Corpus
(3,000 1) - TREC questions are generated from Web logs, and
the style of language (and subjects of interest)
in these logs are more similar to the Web content
than to newswire collections.
24Answer-Based QA
- Database/Knowledge-base/Ontology solution
- When question syntax is simple and reliably
recognizable, can express as a logical form - Logical form represents entire semantics of
question, and can be used to access structured
resource - WordNet
- On-line dictionaries
- Tables of facts figures
- Knowledge-bases such as Cyc
- Having found answer
- construct a query with original question terms
answer - Retrieve passages
- Tell Answer Extraction the answer it is looking
for
25Approaches of Specific Systems
- SMU Falcon
- LCC
- USC-ISI
- Insight
- Microsoft
- IBM
Note Some of the slides and/or examples in
these sections are taken from papers or
presentations from the respective system authors
26SMU Falcon
Harabagiu et al. 2000
27SMU Falcon
- From question, dependency structure called
question semantic form is created - Query is Boolean conjunction of terms
- From answer passages that contain at least one
instance of answer type, generate answer semantic
form - 3 processing loops
- Loop 1
- Triggered when too few or too many passages are
retrieved from search engine - Loop 2
- Triggered when question semantic form and answer
semantic form cannot be unified - Loop 3
- Triggered when unable to perform abductive proof
of answer correctness
28SMU Falcon
- Loops provide opportunities to perform
alternations - Loop 1 morphological expansions and
nominalizations - Loop 2 lexical alternations synonyms, direct
hypernyms and hyponyms - Loop 3 paraphrases
- Evaluation (Pasca Harabagiu, 2001). Increase
in accuracy in 50-byte task in TREC9 - Loop 1 40
- Loop 2 52
- Loop 3 8
- Combined 76
29LCC
- Moldovan Rus, 2001
- Uses Logic Prover for answer justification
- Question logical form
- Candidate answers in logical form
- XWN glosses
- Linguistic axioms
- Lexical chains
- Inference engine attempts to verify answer by
negating question and proving a contradiction - If proof fails, predicates in question are
gradually relaxed until proof succeeds or
associated proof score is below a threshold.
30LCC Lexical Chains
- Q1518 What year did Marco Polo travel to Asia?
- Answer Marco polo divulged the truth after
returning in 1292 from his travels, which
included several months on Sumatra - Lexical Chains
- (1) travel_tov1 -gt GLOSS -gt travelv1
-gt RGLOSS -gt traveln1 - (2) travel_to1 -gt GLOSS -gt travelv1
-gt HYPONYM -gt returnv1
- (3) Sumatran1 -gt ISPART -gt
Indonesian1 -gt ISPART -gt -
Southeast _Asian1 -gt ISPART -gt Asian1 - Q1570 What is the legal age to vote in
Argentina? - Answer Voting is mandatory for all Argentines
aged over 18. - Lexical Chains (1) legala1
-gt GLOSS -gt rulen1 -gt RGLOSS -gt
mandatorya1 - (2) agen1 -gt RGLOSS -gt ageda3
- (3) Argentinea1 -gt GLOSS -gt Argentinan1
31LCC Logic Prover
- Question
- Which company created the Internet Browser
Mosaic? - QLF (_organization_AT(x2) ) company_NN(x2)
create_VB(e1,x2,x6) Internet_NN(x3)
browser_NN(x4) Mosaic_NN(x5)
nn_NNC(x6,x3,x4,x5) - Answer passage
- ... Mosaic , developed by the National Center for
Supercomputing Applications ( NCSA ) at the
University of Illinois at Urbana - Champaign ... - ALF ... Mosaic_NN(x2) develop_VB(e2,x2,x31)
by_IN(e2,x8) National_NN(x3) Center_NN(x4)
for_NN(x5) Supercomputing_NN(x6)
application_NN(x7) nn_NNC(x8,x3,x4,x5,x6,x7)
NCSA_NN(x9) at_IN(e2,x15) University_NN(x10)
of_NN(x11) Illinois_NN(x12) at_NN(x13)
Urbana_NN(x14) nn_NNC(x15,x10,x11,x12,x13,x14)
Champaign_NN(x16) ... - Lexical Chains develop lt-gt make and make
lt-gtcreate - exists x2 x3 x4 all e2 x1 x7 (develop_vb(e2,x7,x1)
lt-gt make_vb(e2,x7,x1) something_nn(x1)
new_jj(x1) such_jj(x1) product_nn(x2)
or_cc(x4,x1,x3) mental_jj(x3) artistic_jj(x3)
creation_nn(x3)). - all e1 x1 x2 (make_vb(e1,x1,x2) lt-gt
create_vb(e1,x1,x2) manufacture_vb(e1,x1,x2)
man-made_jj(x2) product_nn(x2)). - Linguistic axioms
- all x0 (mosaic_nn(x0) -gt internet_nn(x0)
browser_nn(x0))
32USC-ISI
- Textmap system
- Ravichandran and Hovy, 2002
- Hermjakob et al. 2003
- Use of Surface Text Patterns
- When was X born -gt
- Mozart was born in 1756
- Gandhi (1869-1948)
- Can be captured in expressions
- ltNAMEgt was born in ltBIRTHDATEgt
- ltNAMEgt (ltBIRTHDATEgt -
- These patterns can be learned
33USC-ISI TextMap
- Use bootstrapping to learn patterns.
- For an identified question type (When was X
born?), start with known answers for some values
of X - Mozart 1756
- Gandhi 1869
- Newton 1642
- Issue Web search engine queries (e.g. Mozart
1756 ) - Collect top 1000 documents
- Filter, tokenize, smooth etc.
- Use suffix tree constructor to find best
substrings, e.g. - Mozart (1756-1791)
- Filter
- Mozart (1756-
- Replace query strings with e.g. ltNAMEgt and
ltANSWERgt - Determine precision of each pattern
- Find documents with just question term (Mozart)
- Apply patterns and calculate precision
34USC-ISI TextMap
- Finding Answers
- Determine Question type
- Perform IR Query
- Do sentence segmentation and smoothing
- Replace question term by question tag
- i.e. replace Mozart with ltNAMEgt
- Search for instances of patterns associated with
question type - Select words matching ltANSWERgt
- Assign scores according to precision of pattern
35Insight
- Soubbotin, 2002. Soubbotin Soubbotin, 2003.
- Performed very well in TREC10/11
- Comprehensive and systematic use of Indicative
patterns - E.g.
- cap word paren 4 digits dash 4 digits paren
- matches
- Mozart (1756-1791)
- The patterns are broader than named entities
- Semantics in syntax
- Patterns have intrinsic scores (reliability),
independent of question
36Insight
- Patterns with more sophisticated internal
structure are more indicative of answer - 2/3 of their correct entries in TREC10 were
answered by patterns - E.g.
- a countries
- b official posts
- w proper names (first and last)
- e titles or honorifics
- Patterns for Who is the President (Prime
Minister) of given country? - abeww
- ewwdb,a
- b,aeww
- Definition questions (A is primary query term,
X is answer) - ltA comma a/an/the X comma/periodgt
- For Moulin Rouge, a cabaret
- ltX comma also called A commagt
- For naturally occurring gas called methane
- ltA is/are a/an/the Xgt
37Insight
- Emphasis on shallow techniques, lack of NLP
- Look in vicinity of text string potentially
matching pattern for zeroing e.g. for
occupational roles - Former
- Elect
- Deputy
- Negation
- Comments
- Relies on redundancy of large corpus
- Works for factoid question types of TREC-QA not
clear how it extends - Not clear how they match questions to patterns
- Named entities within patterns have to be
recognized
38Microsoft
- Data-Intensive QA. Brill et al. 2002
- Overcoming the surface string mismatch between
the question formulation and the string
containing the answer - Approach based on the assumption/intuition that
someone on the Web has answered the question in
the same way it was asked. - Want to avoid dealing with
- Lexical, syntactic, semantic relationships (bet.
Q A) - Anaphora resolution
- Synonymy
- Alternate syntax
- Indirect answers
- Take advantage of redundancy on Web, then project
to TREC corpus (Answer-based QA)
39Microsoft AskMSR
- Formulate multiple queries each rewrite has
intrinsic score. E.g. for What is relative
humidity? - is relative humidity, LEFT, 5
- relative is humidity, RIGHT, 5
- relative humidity is, RIGHT, 5
- relative humidity, NULL, 2
- relative AND humidity, NULL, 1
- Get top 100 documents from Google
- Extract n-grams from document summaries
- Score n-grams by summing the scores of the
rewrites it came from - Use tiling to merge n-grams
- Search for supporting documents in TREC corpus
40Microsoft AskMSR
- Question is What is the rainiest place on
Earth - Answer from Web is Mount Waialeale
- Passage in TREC corpus is In misty Seattle,
Wash., last year, 32 inches of rain fell. Hong
Kong gets about 80 inches a year, and even Pago
Pago, noted for its prodigious showers, gets only
about 196 inches annually. (The titleholder,
according to the National Geographic Society, is
Mount Waialeale in Hawaii, where about 460 inches
of rain falls each year.) - Very difficult to imagine getting this passage by
other means
41IBM Statistical QA (Ittycheriah, 2001)
q question a answer c correctness e
answer type
p(cq,a) Se p(c,eq,a) Se p(ce,q,a)
p(eq,a)
p(eq,a) is the answer type model
(ATM) p(ce,q,a) is the answer selection model
(ASM)
- ATM predicts, from the question and a proposed
answer, the answer type they both satisfy - Given a question, an answer, and the predicted
answer type, ASM seeks to model the correctness
of this configuration. - Distributions are modelled using a maximum
entropy formulation - Training data human judgments
- For ATM, 13K questions annotated with 31
categories - For ASM, 5K questions from TREC plus trivia
42IBM Statistical QA (Ittycheriah)
- Question Analysis (by ATM)
- Selects one out of 31 categories
- Search
- Question expanded by Local Context Analysis
- Top 1000 documents retrieved
- Passage Extraction Top 100 passages that
- Maximize question word match
- Have desired answer type
- Minimize dispersion of question words
- Have similar syntactic structure to question
- Answer Extraction
- Candidate answers ranked using ASM
43IBM Rule-based
- Predictive Annotation (Prager 2000, Prager 2003)
- Want to make sure passages retrieved by search
engine have at least one candidate answer - Recognize that candidate answer is of correct
answer type which corresponds to a label (or
several) generated by Named Entity Recognizer - Annotate entire corpus and index semantic labels
along with text - Identify answer types in questions and include
corresponding labels in queries
44IBM PIQUANT
- Predictive Annotation
- E.g. Question is Who invented baseball?
- Who can map to PERSON or ORGANIZATION
- Suppose we assume only people invent things (it
doesnt really matter). - So Who invented baseball? -gt PERSON invent
baseball
Consider text but its conclusion was based
largely on the recollections of a man named Abner
Graves, an elderly mining engineer, who reported
that baseball had been "invented" by Doubleday
between 1839 and 1841.
45IBM PIQUANT
- Predictive Annotation
- Previous example
- Who invented baseball? -gt PERSON invent
baseball - However, same structure is equally effective at
answering - What sport did Doubleday invent? -gt SPORT
invent Doubleday
46IBM Rule-Based
- Handling Subsumption Disjunction
- If an entity is of a type which has a parent
type, then how is annotation done? - If a proposed answer type has a parent type, then
what answer type should be used? - If an entity is ambiguous then what should the
annotation be? - If the answer type is ambiguous, then what should
be used?
- Guidelines
- If an entity is of a type which has a parent
type, then how is annotation done? - If a proposed answer type has a parent type, then
what answer type should be used? - If an entity is ambiguous then what should the
annotation be? - If the answer type is ambiguous, then what should
be used?
47Subsumption Disjunction
- Consider New York City both a CITY and a PLACE
- To answer Where did John Lennon die?, it needs
to be a PLACE - To answer In what city is the Empire State
Building?, it needs to be a CITY. - Do NOT want to do subsumption calculation in
search engine - Two scenarios
- 1. Expand Answer Type and use most
specific entity annotation - 1A (CITY PLACE) John_Lennon die matches
CITY - 1B CITY Empire_State_Building matches CITY
- Or
- 2. Use most specific Answer Type and multiple
annotations of NYC - 2A PLACE John_Lennon die matches (CITY
PLACE) - 2B CITY Empire_State_Building matches
(CITY PLACE) - Case 2 preferred for simplicity, because
disjunction in 1 should contain all hyponyms of
PLACE, while disjunction in 2 should contain all
hypernyms of CITY - Choice 2 suggests can use disjunction in answer
type to represent ambiguity - Who invented the laser -gt (PERSON
ORGANIZATION) invent laser
48Clausal classes
- Any structure that can be recognized in text can
be annotated. - Quotations
- Explanations
- Methods
- Opinions
-
- Any semantic class label used in annotation can
be indexed, and hence used as a target of search - What did Karl Marx say about religion?
- Why is the sky blue?
- How do you make bread?
- What does Arnold Schwarzenegger think about
global warming?
49Named Entity Recognition
50IBM
- Predictive Annotation Improving Precision at no
cost to Recall - E.g. Question is Where is Belize?
- Where can map to (CONTINENT, WORLDREGION,
COUNTRY, STATE, CITY, CAPITAL, LAKE, RIVER
). - But we know Belize is a country.
- So Where is Belize? -gt (CONTINENT
WORLDREGION) Belize - Belize occurs 1068 times in TREC corpus
- Belize and PLACE co-occur in only 537 sentences
- Belize and CONTINENT or WORLDREGION co-occur in
only 128 sentences
51(No Transcript)
52(No Transcript)
53Virtual Annotation (Prager 2001)
- Use WordNet to find all candidate answers
(hypernyms) - Use corpus co-occurrence statistics to select
best ones - Rather like approach to WSD by Mihalcea and
Moldovan (1999)
54Parentage of nematode
Level Synset
0 nematode, roundworm
1 worm
2 invertebrate
3 animal, animate being, beast, brute, creature, fauna
4 life form, organism, being, living thing
5 entity, something
55Parentage of meerkat
Level Synset
0 meerkat, mierkat
1 viverrine, viverrine mammal
2 carnivore
3 placental, placental mammal, eutherian, eutherian mammal
4 mammal
5 vertebrate, craniate
6 chordate
7 animal, animate being, beast, brute, creature, fauna
8 life form, organism, being, living thing
9 entity, something
56Natural Categories
- Basic Objects in Natural Categories Rosch et
al. (1976) - According to psychological testing, these are
categorization levels of intermediate specificity
that people tend to use in unconstrained
settings.
57What is this?
58What can we conclude?
- There are descriptive terms that people are drawn
to use naturally. - We can expect to find instances of these in text,
in the right contexts. - These terms will serve as good answers.
59Virtual Annotation (cont.)
- Find all parents of query term in WordNet
- Look for co-occurrences of query term and parent
in text corpus - Expect to find snippets such as
meerkats and other Y - Many different phrasings are possible, so we just
look for proximity, rather than parse. - Scoring
- Count co-occurrences of each parent with search
term, and divide by level number (only levels gt
1), generating Level-Adapted Count (LAC). - Exclude very highest levels (too general).
- Select parent with highest LAC plus any others
with LAC within 20.
60Parentage of nematode
Level Synset
0 nematode, roundworm
1 worm(13)
2 invertebrate
3 animal(2), animate being, beast, brute, creature, fauna
4 life form(2), organism(3), being, living thing
5 entity, something
61Parentage of meerkat
Level Synset
0 meerkat, mierkat
1 viverrine, viverrine mammal
2 carnivore
3 placental, placental mammal, eutherian, eutherian mammal
4 mammal
5 vertebrate, craniate
6 chordate
7 animal(2), animate being, beast, brute, creature, fauna
8 life form, organism, being, living thing
9 entity, something
62Sample Answer Passages
Use Answer-based QA to locate answers
- What is a nematode? -gt
- Such genes have been found in nematode worms but
not yet in higher animals. - What is a meerkat? -gt
- South African golfer Butch Kruger had a good
round going in the central Orange Free State
trials, until a mongoose-like animal grabbed his
ball with its mouth and dropped down its hole.
Kruger wrote on his card "Meerkat."
63Use of Cyc as Sanity Checker
- Cyc Large Knowledge-base and Inference engine
(Lenat 1995) - A post-hoc process for
- Rejecting insane answers
- How much does a grey wolf weigh?
- 300 tons
- Boosting confidence for sane answers
- Sanity checker invoked with
- Predicate, e.g. weight
- Focus, e.g. grey wolf
- Candidate value, e.g. 300 tons
- Sanity checker returns
- Sane or 10 of value in Cyc
- Insane outside of the reasonable range
- Plan to use distributions instead of ranges
- Dont know
- Confidence score highly boosted when answer is
sane
64Cyc Sanity Checking Example
- Trec11 Q What is the population of Maryland?
- Without sanity checking
- PIQUANTs top answer 50,000
- Justification Marylands population is 50,000
and growing rapidly. - Passage discusses an exotic species nutria, not
humans - With sanity checking
- Cyc knows the population of Maryland is 5,296,486
- It rejects the top insane answers
- PIQUANTs new top answer 5.1 million with very
high confidence
65AskMSR
- Process the question by
- Forming a search engine query from the original
question - Detecting the answer type
- Get some results
- Extract answers of the right type based on
- How often they occur
66AskMSR
67Step 1 Rewrite the questions
- Intuition The users question is often
syntactically quite close to sentences that
contain the answer - Where is the Louvre Museum located?
- The Louvre Museum is located in Paris
- Who created the character of Scrooge?
- Charles Dickens created the character of Scrooge.
68Query rewriting
- Classify question into seven categories
- Who is/was/are/were?
- When is/did/will/are/were ?
- Where is/are/were ?
- a. Hand-crafted category-specific transformation
rules - e.g. For where questions, move is to all
possible locations - Look to the right of the query terms for the
answer. - Where is the Louvre Museum located?
- ? is the Louvre Museum located
- ? the is Louvre Museum located
- ? the Louvre is Museum located
- ? the Louvre Museum is located
- ? the Louvre Museum located is
69Step 2 Query search engine
- Send all rewrites to a Web search engine
- Retrieve top N answers (100-200)
- For speed, rely just on search engines
snippets, not the full text of the actual
document
70Step 3 Gathering N-Grams
- Enumerate all N-grams (N1,2,3) in all retrieved
snippets - Weight of an n-gram occurrence count, each
weighted by reliability (weight) of rewrite
rule that fetched the document - Example Who created the character of Scrooge?
- Dickens 117
- Christmas Carol 78
- Charles Dickens 75
- Disney 72
- Carl Banks 54
- A Christmas 41
- Christmas Carol 45
- Uncle 31
71Step 4 Filtering N-Grams
- Each question type is associated with one or more
data-type filters regular expressions for
answer types - Boost score of n-grams that match the expected
answer type. - Lower score of n-grams that dont match.
72Step 5 Tiling the Answers
Scores 20 15 10
merged, discard old n-grams
Charles Dickens
Dickens
Mr Charles
Mr Charles Dickens
Score 45
73Results
- Standard TREC contest test-bed (TREC 2001) 1M
documents 900 questions - Technique does ok, not great (would have placed
in top 9 of 30 participants) - But with access to the Web They do much better,
would have come in second on TREC 2001
74Issues
- In many scenarios (e.g., monitoring an
individuals email) we only have a small set of
documents - Works best/only for Trivial Pursuit-style
fact-based questions - Limited/brittle repertoire of
- question categories
- answer data types/filters
- query rewriting rules
75ISI Surface patterns approach
- Use of Characteristic Phrases
- "When was ltpersongt born
- Typical answers
- "Mozart was born in 1756.
- "Gandhi (1869-1948)...
- Suggests phrases like
- "ltNAMEgt was born in ltBIRTHDATEgt
- "ltNAMEgt ( ltBIRTHDATEgt-
- as Regular Expressions can help locate correct
answer
76Use Pattern Learning
- Example
- The great composer Mozart (1756-1791) achieved
fame at a young age - Mozart (1756-1791) was a genius
- The whole world would always be indebted to the
great music of Mozart (1756-1791) - Longest matching substring for all 3 sentences is
"Mozart (1756-1791) - Suffix tree would extract "Mozart (1756-1791)" as
an output, with score of 3 - Reminiscent of IE pattern learning
77Pattern Learning (cont.)
- Repeat with different examples of same question
type - Gandhi 1869, Newton 1642, etc.
- Some patterns learned for BIRTHDATE
- a. born in ltANSWERgt, ltNAMEgt
- b. ltNAMEgt was born on ltANSWERgt ,
- c. ltNAMEgt ( ltANSWERgt -
- d. ltNAMEgt ( ltANSWERgt - )
78Experiments
- 6 different Q types
- from Webclopedia QA Typology (Hovy et al., 2002a)
- BIRTHDATE
- LOCATION
- INVENTOR
- DISCOVERER
- DEFINITION
- WHY-FAMOUS
79Experiments pattern precision
- BIRTHDATE table
- 1.0 ltNAMEgt ( ltANSWERgt - )
- 0.85 ltNAMEgt was born on ltANSWERgt,
- 0.6 ltNAMEgt was born in ltANSWERgt
- 0.59 ltNAMEgt was born ltANSWERgt
- 0.53 ltANSWERgt ltNAMEgt was born
- 0.50 - ltNAMEgt ( ltANSWERgt
- 0.36 ltNAMEgt ( ltANSWERgt -
- INVENTOR
- 1.0 ltANSWERgt invents ltNAMEgt
- 1.0 the ltNAMEgt was invented by ltANSWERgt
- 1.0 ltANSWERgt invented the ltNAMEgt in
80Experiments (cont.)
- DISCOVERER
- 1.0 when ltANSWERgt discovered ltNAMEgt
- 1.0 ltANSWERgt's discovery of ltNAMEgt
- 0.9 ltNAMEgt was discovered by ltANSWERgt in
- DEFINITION
- 1.0 ltNAMEgt and related ltANSWERgt
- 1.0 form of ltANSWERgt, ltNAMEgt
- 0.94 as ltNAMEgt, ltANSWERgt and
81Experiments (cont.)
- WHY-FAMOUS
- 1.0 ltANSWERgt ltNAMEgt called
- 1.0 laureate ltANSWERgt ltNAMEgt
- 0.71 ltNAMEgt is the ltANSWERgt of
- LOCATION
- 1.0 ltANSWERgt's ltNAMEgt
- 1.0 regional ltANSWERgt ltNAMEgt
- 0.92 near ltNAMEgt in ltANSWERgt
- Depending on question type, get high MRR
(0.60.9), with higher results from use of Web
than TREC QA collection
82Shortcomings Extensions
- Need for POS /or semantic types
- "Where are the Rocky Mountains?
- "Denver's new airport, topped with white
fiberglass cones in imitation of the Rocky
Mountains in the background , continues to lie
empty - ltNAMEgt in ltANSWERgt
- NE tagger /or ontology could enable system to
determine "background" is not a location
83Shortcomings... (cont.)
- Long distance dependencies
- "Where is London?
- "London, which has one of the most busiest
airports in the world, lies on the banks of the
river Thames - would require pattern likeltQUESTIONgt,
(ltany_wordgt), lies on ltANSWERgt - Abundance variety of Web data helps system to
find an instance of patterns w/o losing answers
to long distance dependencies
84Shortcomings... (cont.)
- System currently has only one anchor word
- Doesn't work for Q types requiring multiple words
from question to be in answer - "In which county does the city of Long Beach
lie? - "Long Beach is situated in Los Angeles County
- required pattern ltQ_TERM_1gt is situated in
ltANSWERgt ltQ_TERM_2gt - Does not use case
- "What is a micron?
- "...a spokesman for Micron, a maker of
semiconductors, said SIMMs are..." - If Micron had been capitalized in question, would
be a perfect answer
85QA Typology from ISI (USC)
- Typology of typical Q forms94 nodes (47 leaf
nodes) - Analyzed 17,384 questions (from answers.com)
86Question Answering Example
- How hot does the inside of an active volcano get?
- get(TEMPERATURE, inside(volcano(active)))
- lava fragments belched out of the mountain were
as hot as 300 degrees Fahrenheit - fragments(lava, TEMPERATURE(degrees(300)),
- belched(out, mountain))
- volcano ISA mountain
- lava ISPARTOF volcano ? lava inside volcano
- fragments of lava HAVEPROPERTIESOF lava
- The needed semantic information is in WordNet
definitions, and was successfully translated into
a form that was used for rough proofs
87References
- AskMSR Question Answering Using the Worldwide
Web - Michele Banko, Eric Brill, Susan Dumais, Jimmy
Lin - http//www.ai.mit.edu/people/jimmylin/publications
/Banko-etal-AAAI02.pdf - In Proceedings of 2002 AAAI SYMPOSIUM on Mining
Answers from Text and Knowledge Bases, March
2002 - Web Question Answering Is More Always Better?
- Susan Dumais, Michele Banko, Eric Brill, Jimmy
Lin, Andrew Ng - http//research.microsoft.com/sdumais/SIGIR2002-Q
A-Submit-Conf.pdf - D. Ravichandran and E.H. Hovy. 2002. Learning
Surface Patterns for a Question Answering
System.ACL conference, July 2002.
88Harder Questions
- Factoid question answering is really pretty
silly. - A more interesting task is one where the answers
are fluid and depend on the fusion of material
from disparate texts over time. - Who is Condoleezza Rice?
- Who is Mahmoud Abbas?
- Why was Arafat flown to Paris?