Question Answering - PowerPoint PPT Presentation

About This Presentation
Title:

Question Answering

Description:

Question Answering Marti Hearst November 14, 2005 – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 96
Provided by: peopleIsc2
Category:

less

Transcript and Presenter's Notes

Title: Question Answering


1
Question Answering
Marti Hearst November 14, 2005    
2
Question Answering
  • Outline
  • Introduction to QA
  • A typical full-fledged QA system
  • A very simple system, in response to this
  • An intermediate approach
  • Incorporating a reasoning system
  • Machine Learning of mappings
  • Other question types (e.g., biography,
    definitions)

3
A of Search Types
Spectrum
  • What is the typical height of a giraffe?
  • What are some good ideas for landscaping my
    clients yard?
  • What are some promising untried treatments for
    Raynauds disease?

4
Beyond Document Retrieval
  • Document Retrieval
  • Users submit queries corresponding to their
    information needs.
  • System returns (voluminous) list of full-length
    documents.
  • It is the responsibility of the users to find
    information of interest within the returned
    documents.
  • Open-Domain Question Answering (QA)
  • Users ask questions in natural language.
  • What is the highest volcano in Europe?
  • System returns list of short answers.
  • Under Mount Etna, the highest volcano
    in Europe, perches the fabulous town
  • A real use for NLP

5
Questions and Answers
  • What is the height of a typical giraffe?
  • The result can be a simple answer, extracted from
    existing web pages.
  • Can specify with keywords or a natural language
    query
  • However, most web search engines are not set up
    to handle questions properly.
  • Get different results using a question vs.
    keywords

6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
The Problem of Question Answering
Natural language question, not keyword queries
What is the nationality of Pope John Paul II?
stabilize the country with its help, the
Catholic hierarchy stoutly held out for
pluralism, in large part at the urging of
Polish-born Pope John Paul II. When the Pope
emphatically defended the Solidarity trade union
during a 1987 tour of the
Short text fragment, not URL list
11
Question Answering from text
  • With massive collections of full-text documents,
    simply finding relevant documents is of limited
    use we want answers
  • QA give the user a (short) answer to their
    question, perhaps supported by evidence.
  • An alternative to standard IR
  • The first problem area in IR where NLP is really
    making a difference.

12
People want to ask questions
Examples from AltaVista query log who invented
surf music? how to make stink bombs where are the
snowdens of yesteryear? which english translation
of the bible is used in official catholic
liturgies? how to do clayart how to copy psx how
tall is the sears tower? Examples from Excite
query log (12/1999) how can i find someone in
texas where can i find information on puritan
religion? what are the 7 wonders of the world how
can i eliminate stress What vacuum cleaner does
Consumers Guide recommend
13
A Brief (Academic) History
  • In some sense question answering is not a new
    research area
  • Question answering systems can be found in many
    areas of NLP research, including
  • Natural language database systems
  • A lot of early NLP work on these
  • Problem-solving systems
  • STUDENT (Winograd 77)
  • LUNAR (Woods Kaplan 77)
  • Spoken dialog systems
  • Currently very active and commercially relevant
  • The focus is now on open-domain QA is new
  • First modern system MURAX (Kupiec, SIGIR93)
  • Trivial Pursuit questions
  • Encyclopedia answers
  • FAQFinder (Burke et al. 97)
  • TREC QA competition (NIST, 1999present)

14
AskJeeves
  • AskJeeves is probably most hyped example of
    Question answering
  • How it used to work
  • Do pattern matching to match a question to their
    own knowledge base of questions
  • If a match is found, returns a human-curated
    answer to that known question
  • If that fails, it falls back to regular web
    search
  • (Seems to be more of a meta-search engine now)
  • A potentially interesting middle ground, but a
    fairly weak shadow of real QA

15
Question Answering at TREC
  • Question answering competition at TREC consists
    of answering a set of 500 fact-based questions,
    e.g.,
  • When was Mozart born?.
  • Has really pushed the field forward.
  • The document set
  • Newswire textual documents from LA Times, San
    Jose Mercury News, Wall Street Journal, NY Times
    etcetera over 1M documents now.
  • Well-formed lexically, syntactically and
    semantically (were reviewed by professional
    editors).
  • The questions
  • Hundreds of new questions every year, the total
    is 2400
  • Task
  • Initially extract at most 5 answers long (250B)
    and short (50B).
  • Now extract only one exact answer.
  • Several other sub-tasks added later definition,
    list, biography.

16
Sample TREC questions
1. Who is the author of the book, "The Iron Lady
A Biography of Margaret Thatcher"? 2. What was
the monetary value of the Nobel Peace Prize in
1989? 3. What does the Peugeot company
manufacture? 4. How much did Mercury spend on
advertising in 1993? 5. What is the name of the
managing director of Apricot Computer? 6. Why
did David Koresh ask the FBI for a word
processor? 7. What is the name of the rare
neurological disease with symptoms such as
involuntary movements (tics), swearing, and
incoherent vocalizations (grunts, shouts, etc.)?
17
TREC Scoring
  • For the first three years systems were allowed to
    return 5 ranked answer snippets (50/250 bytes) to
    each question.
  • Mean Reciprocal Rank Scoring (MRR)
  • Each question assigned the reciprocal rank of the
    first correct answer. If correct answer at
    position k, the score is 1/k.
  • 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6
    position
  • Mainly Named Entity answers (person, place, date,
    )
  • From 2002 on, the systems are only allowed to
    return a single exact answer and the notion of
    confidence has been introduced.

18
Top Performing Systems
  • In 2003, the best performing systems at TREC can
    answer approximately 60-70 of the questions
  • Approaches and successes have varied a fair deal
  • Knowledge-rich approaches, using a vast array of
    NLP techniques stole the show in 2000-2003
  • Notably Harabagiu, Moldovan et al. ( SMU/UTD/LCC
    )
  • Statistical systems starting to catch up
  • AskMSR system stressed how much could be achieved
    by very simple methods with enough text (and now
    various copycats)
  • People are experimenting with machine learning
    methods
  • Middle ground is to use large collection of
    surface matching patterns (ISI)

19
Example QA System
  • This system contains many components used by
    other systems, but more complex in some ways
  • Most work completed in 2001 there have been
    advances by this group and others since then.
  • Next slides based mainly on
  • Pasca and Harabagiu, High-Performance Question
    Answering from Large Text Collections, SIGIR01.
  • Pasca and Harabagiu, Answer Mining from Online
    Documents, ACL01.
  • Harabagiu, Pasca, Maiorano Experiments with
    Open-Domain Textual Question Answering. COLING00

20
QA Block Architecture
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
21
Question Processing Flow
Question semantic representation
Construction of the question representation
Q
Question parsing
Answer type detection
AT category
Keyword selection
Keywords
22
Question Stems and Answer Types
Identify the semantic category of expected answers
Question Question stem Answer type
Q555 What was the name of Titanics captain? What Person
Q654 What U.S. Government agency registers trademarks? What Organization
Q162 What is the capital of Kosovo? What City
Q661 How much does one ton of cement cost? How much Quantity
  • Other question stems Who, Which, Name, How
    hot...
  • Other answer types Country, Number, Product...

23
Detecting the Expected Answer Type
  • In some cases, the question stem is sufficient to
    indicate the answer type (AT)
  • Why ? REASON
  • When ? DATE
  • In many cases, the question stem is ambiguous
  • Examples
  • What was the name of Titanics captain ?
  • What U.S. Government agency registers trademarks?
  • What is the capital of Kosovo?
  • Solution select additional question concepts (AT
    words) that help disambiguate the expected answer
    type
  • Examples
  • captain
  • agency
  • capital

24
Answer Type Taxonomy
  • Encodes 8707 English concepts to help recognize
    expected answer type
  • Mapping to parts of Wordnet done by hand
  • Can connect to Noun, Adj, and/or Verb
    subhierarchies

25
Answer Type Detection Algorithm
  • Select the answer type word from the question
    representation.
  • Select the word(s) connected to the question.
    Some content-free words are skipped (e.g.
    name).
  • From the previous set select the word with the
    highest connectivity in the question
    representation.
  • Map the AT word in a previously built AT
    hierarchy
  • The AT hierarchy is based on WordNet, with some
    concepts associated with semantic categories,
    e.g. writer ? PERSON.
  • Select the AT(s) from the first hypernym(s)
    associated with a semantic category.

26
Answer Type Hierarchy
PERSON
PERSON
27
Evaluation of Answer Type Hierarchy
  • This evaluation done in 2001
  • Controlled the variation of the number of WordNet
    synsets included in the answer type hierarchy.
  • Test on 800 TREC questions.

Precision score (50-byte answers)
Hierarchy coverage
0 0.296
3 0.404 10
0.437 25
0.451 50
0.461
  • The derivation of the answer type is the main
    source of unrecoverable errors in the QA system

28
Keyword Selection
  • Answer Type indicates what the question is
    looking for, but provides insufficient context to
    locate the answer in very large document
    collection
  • Lexical terms (keywords) from the question,
    possibly expanded with lexical/semantic
    variations provide the required context.

29
Lexical Term Extraction
  • Questions approximated by sets of unrelated words
    (lexical terms)
  • Similar to bag-of-word IR models

Question (from TREC QA track) Lexical terms
Q002 What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize
Q003 What does the Peugeot company manufacture? Peugeot, company, manufacture
Q004 How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993
Q005 What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer
30
Keyword Selection Algorithm
  1. Select all non-stopwords in quotations
  2. Select all NNP words in recognized named entities
  3. Select all complex nominals with their adjectival
    modifiers
  4. Select all other complex nominals
  5. Select all nouns with adjectival modifiers
  6. Select all other nouns
  7. Select all verbs
  8. Select the AT word (which was skipped in all
    previous steps)

31
Keyword Selection Examples
  • What researcher discovered the vaccine against
    Hepatitis-B?
  • Hepatitis-B, vaccine, discover, researcher
  • What is the name of the French oceanographer who
    owned Calypso?
  • Calypso, French, own, oceanographer
  • What U.S. government agency registers trademarks?
  • U.S., government, trademarks, register, agency
  • What is the capital of Kosovo?
  • Kosovo, capital

32
Passage Retrieval
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
33
Passage Extraction Loop
  • Passage Extraction Component
  • Extracts passages that contain all selected
    keywords
  • Passage size dynamic
  • Start position dynamic
  • Passage quality and keyword adjustment
  • In the first iteration use the first 6 keyword
    selection heuristics
  • If the number of passages is lower than a
    threshold ? query is too strict ? drop a keyword
  • If the number of passages is higher than a
    threshold ? query is too relaxed ? add a keyword

34
Passage Retrieval Architecture
Passage Quality
Keywords
Yes
Keyword Adjustment
Passage Scoring
Passage Ordering
No
Passages
Ranked Passages
Passage Extraction
Documents
Document Retrieval
35
Passage Scoring
  • Passages are scored based on keyword windows
  • For example, if a question has a set of keywords
    k1, k2, k3, k4, and in a passage k1 and k2 are
    matched twice, k3 is matched once, and k4 is not
    matched, the following windows are built

Window 1
Window 2
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
Window 3
Window 4
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
36
Passage Scoring
  • Passage ordering is performed using a radix sort
    that involves three scores
  • SameWordSequenceScore (largest)
  • Computes the number of words from the question
    that are recognized in the same sequence in the
    window
  • DistanceScore (largest)
  • The number of words that separate the most
    distant keywords in the window
  • MissingKeywordScore (smallest)
  • The number of unmatched keywords in the window

37
Answer Extraction
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
38
Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.
  • Answer type Person
  • Text passage
  • Among them was Christa McAuliffe, the first
    private citizen to fly in space. Karen Allen,
    best known for her starring role in Raiders of
    the Lost Ark, plays McAuliffe. Brian Kerwin is
    featured as shuttle pilot Mike Smith...
  • Best candidate answer Christa McAuliffe

39
Features for Answer Ranking
  • relNMW number of question terms matched in the
    answer passage
  • relSP number of question terms matched in the
    same phrase as the candidate answer
  • relSS number of question terms matched in the
    same sentence as the candidate answer
  • relFP flag set to 1 if the candidate answer is
    followed by a punctuation sign
  • relOCTW number of question terms matched,
    separated from the candidate answer by at
    most three words and one comma
  • relSWS number of terms occurring in the same
    order in the answer passage as in the
    question
  • relDTW average distance from candidate answer to
    question term matches

SIGIR 01
40
Answer Ranking based on Machine Learning
  • Relative relevance score computed for each pair
    of candidates (answer windows)
  • relPAIR wSWS ? ?relSWS wFP ? ?relFP
  • wOCTW ? ?relOCTW wSP ? ?relSP wSS
    ? ?relSS
  • wNMW ? ?relNMW wDTW ? ?relDTW
    threshold
  • If relPAIR positive, then first candidate from
    pair is more relevant
  • Perceptron model used to learn the weights
  • Scores in the 50 MRR for short answers, in the
    60 MRR for long answers

41
Evaluation on the Web
  • test on 350 questions from TREC (Q250-Q600)
  • extract 250-byte answers

Google Answer extraction from Google AltaVista Answer extraction from AltaVista
Precision score 0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45
42
Can we make this simpler?
  • One reason systems became so complex is that they
    have to pick out one sentence within a small
    collection
  • The answer is likely to be stated in a
    hard-to-recognize manner.
  • Alternative Idea
  • What happens with a much larger collection?
  • The web is so huge that youre likely to see the
    answer stated in a form similar to the question
  • Goal make the simplest possible QA system by
    exploiting this redundancy in the web
  • Use this as a baseline against which to compare
    more elaborate systems.
  • The next slides based on
  • Web Question Answering Is More Always Better?
    Dumais, Banko, Brill, Lin, Ng, SIGIR02
  • An Analysis of the AskMSR Question-Answering
    System, Brill, Dumais, and Banko, EMNLP02.

43
AskMSR System Architecture
2
1
3
5
4
44
Step 1 Rewrite the questions
  • Intuition The users question is often
    syntactically quite close to sentences that
    contain the answer
  • Where is the Louvre Museum located?
  • The Louvre Museum is located in Paris
  • Who created the character of Scrooge?
  • Charles Dickens created the character of Scrooge.

45
Query rewriting
  • Classify question into seven categories
  • Who is/was/are/were?
  • When is/did/will/are/were ?
  • Where is/are/were ?
  • a. Hand-crafted category-specific transformation
    rules
  • e.g. For where questions, move is to all
    possible locations
  • Look to the right of the query terms for the
    answer.
  • Where is the Louvre Museum located?
  • ? is the Louvre Museum located
  • ? the is Louvre Museum located
  • ? the Louvre is Museum located
  • ? the Louvre Museum is located
  • ? the Louvre Museum located is
  • b. Expected answer Datatype (eg, Date, Person,
    Location, )
  • When was the French Revolution? ? DATE

Nonsense,but ok. Its only a fewmore
queriesto the search engine.
46
Query Rewriting - weighting
  • Some query rewrites are more reliable than
    others.

Where is the Louvre Museum located?
Weight 5if a match,probably right
Weight 1 Lots of non-answerscould come back too
the Louvre Museum is located
Louvre Museum located
47
Step 2 Query search engine
  • Send all rewrites to a Web search engine
  • Retrieve top N answers (100-200)
  • For speed, rely just on search engines
    snippets, not the full text of the actual
    document

48
Step 3 Gathering N-Grams
  • Enumerate all N-grams (N1,2,3) in all retrieved
    snippets
  • Weight of an n-gram occurrence count, each
    weighted by reliability (weight) of rewrite
    rule that fetched the document
  • Example Who created the character of Scrooge?
  • Dickens 117
  • Christmas Carol 78
  • Charles Dickens 75
  • Disney 72
  • Carl Banks 54
  • A Christmas 41
  • Christmas Carol 45
  • Uncle 31

49
Step 4 Filtering N-Grams
  • Each question type is associated with one or more
    data-type filters regular expression
  • When
  • Where
  • What
  • Who
  • Boost score of n-grams that match regexp
  • Lower score of n-grams that dont match regexp
  • Details omitted from paper.

Date
Location
Person
50
Step 5 Tiling the Answers
Scores 20 15 10
merged, discard old n-grams
Charles Dickens
Dickens
Mr Charles
Mr Charles Dickens
Score 45
N-Grams
N-Grams
tile highest-scoring n-gram
Repeat, until no more overlap
51
Results
  • Standard TREC contest test-bed (TREC 2001) 1M
    documents 900 questions
  • Technique doesnt do too well (though would have
    placed in top 9 of 30 participants)
  • MRR strict .34
  • MRR lenient .43
  • 9th place

52
Results
  • From EMNLP02 paper
  • MMR of .577 answers 61 correctly
  • Would be near the top of TREC-9 runs
  • Breakdown of feature contribution

53
Issues
  • Works best/only for Trivial Pursuit-style
    fact-based questions
  • Limited/brittle repertoire of
  • question categories
  • answer data types/filters
  • query rewriting rules

54
Intermediate ApproachSurface pattern discovery
  • Based on
  • Ravichandran, D. and Hovy E.H. Learning Surface
    Text Patterns for a Question Answering System,
    ACL02
  • Hovy, et al., Question Answering in Webclopedia,
    TREC-9, 2000.
  • Use of Characteristic Phrases
  • "When was ltpersongt born
  • Typical answers
  • "Mozart was born in 1756.
  • "Gandhi (1869-1948)...
  • Suggests regular expressions to help locate
    correct answer
  • "ltNAMEgt was born in ltBIRTHDATEgt
  • "ltNAMEgt ( ltBIRTHDATEgt-

55
Use Pattern Learning
  • Examples
  • The great composer Mozart (1756-1791) achieved
    fame at a young age
  • Mozart (1756-1791) was a genius
  • The whole world would always be indebted to the
    great music of Mozart (1756-1791)
  • Longest matching substring for all 3 sentences is
    "Mozart (1756-1791)
  • Suffix tree would extract "Mozart (1756-1791)" as
    an output, with score of 3
  • Reminiscent of IE pattern learning

56
Pattern Learning (cont.)
  • Repeat with different examples of same question
    type
  • Gandhi 1869, Newton 1642, etc.
  • Some patterns learned for BIRTHDATE
  • a. born in ltANSWERgt, ltNAMEgt
  • b. ltNAMEgt was born on ltANSWERgt ,
  • c. ltNAMEgt ( ltANSWERgt -
  • d. ltNAMEgt ( ltANSWERgt - )

57
QA Typology from ISI
  • Typology of typical question forms94 nodes (47
    leaf nodes)
  • Analyzed 17,384 questions (from answers.com)

58
Experiments
  • 6 different question types
  • from Webclopedia QA Typology
  • BIRTHDATE
  • LOCATION
  • INVENTOR
  • DISCOVERER
  • DEFINITION
  • WHY-FAMOUS

59
Experiments pattern precision
  • BIRTHDATE
  • 1.0 ltNAMEgt ( ltANSWERgt - )
  • 0.85 ltNAMEgt was born on ltANSWERgt,
  • 0.6 ltNAMEgt was born in ltANSWERgt
  • 0.59 ltNAMEgt was born ltANSWERgt
  • 0.53 ltANSWERgt ltNAMEgt was born
  • 0.50 - ltNAMEgt ( ltANSWERgt
  • 0.36 ltNAMEgt ( ltANSWERgt -
  • INVENTOR
  • 1.0 ltANSWERgt invents ltNAMEgt
  • 1.0 the ltNAMEgt was invented by ltANSWERgt
  • 1.0 ltANSWERgt invented the ltNAMEgt in

60
Experiments (cont.)
  • DISCOVERER
  • 1.0 when ltANSWERgt discovered ltNAMEgt
  • 1.0 ltANSWERgt's discovery of ltNAMEgt
  • 0.9 ltNAMEgt was discovered by ltANSWERgt in
  • DEFINITION
  • 1.0 ltNAMEgt and related ltANSWERgt
  • 1.0 form of ltANSWERgt, ltNAMEgt
  • 0.94 as ltNAMEgt, ltANSWERgt and

61
Experiments (cont.)
  • WHY-FAMOUS
  • 1.0 ltANSWERgt ltNAMEgt called
  • 1.0 laureate ltANSWERgt ltNAMEgt
  • 0.71 ltNAMEgt is the ltANSWERgt of
  • LOCATION
  • 1.0 ltANSWERgt's ltNAMEgt
  • 1.0 regional ltANSWERgt ltNAMEgt
  • 0.92 near ltNAMEgt in ltANSWERgt
  • Depending on question type, get high MRR
    (0.60.9), with higher results from use of Web
    than TREC QA collection

62
Shortcomings Extensions
  • Need for POS /or semantic types
  • "Where are the Rocky Mountains?
  • "Denver's new airport, topped with white
    fiberglass cones in imitation of the Rocky
    Mountains in the background , continues to lie
    empty
  • ltNAMEgt in ltANSWERgt
  • NE tagger /or ontology could enable system to
    determine "background" is not a location

63
Shortcomings... (cont.)
  • Long distance dependencies
  • "Where is London?
  • "London, which has one of the busiest airports
    in the world, lies on the banks of the river
    Thames
  • would require pattern likeltQUESTIONgt,
    (ltany_wordgt), lies on ltANSWERgt
  • Abundance variety of Web data helps system to
    find an instance of patterns w/o losing answers
    to long distance dependencies

64
Shortcomings... (cont.)
  • System currently has only one anchor word
  • Doesn't work for Q types requiring multiple words
    from question to be in answer
  • "In which county does the city of Long Beach
    lie?
  • "Long Beach is situated in Los Angeles County
  • required pattern ltQ_TERM_1gt is situated in
    ltANSWERgt ltQ_TERM_2gt
  • Does not use case
  • "What is a micron?
  • "...a spokesman for Micron, a maker of
    semiconductors, said SIMMs are..."
  • If Micron had been capitalized in question, would
    be a perfect answer

65
The Importance of NER
  • The results of the past 5 TREC evaluations of QA
    systems indicate that current state-of-the-art QA
    is determined by the recognition of Named
    Entities
  • In TREC 2003 the LCC QA system extracted 289
    correct answers for factoid questions
  • The Name Entity Recognizer was responsible for
    234 of them

QUANTITY 55 ORGANIZATION 15 PRICE 3
NUMBER 45 AUTHORED WORK 11 SCIENCE NAME 2
DATE 35 PRODUCT 11 ACRONYM 1
PERSON 31 CONTINENT 5 ADDRESS 1
COUNTRY 21 PROVINCE 5 ALPHABET 1
OTHER LOCATIONS 19 QUOTE 5 URI 1
CITY 19 UNIVERSITY 3
66
The Special Case of Names
Questions asking for names of authored works
1934 What is the play West Side Story based on? Answer Romeo and Juliet
1976 What is the motto for the Boy Scouts? Answer Be prepared.
1982 What movie won the Academy Award for best picture in 1989? Answer Driving Miss Daisy
2080 What peace treaty ended WWI? Answer Versailles
2102 What American landmark stands on Liberty Island? Answer Statue of Liberty
67
Problems
  • NE assumes all answers are named entities
  • Oversimplifies the generative power of language!
  • What about What kind of flowers did Van Gogh
    paint?
  • Does not account well for morphological, lexical,
    and semantic alternations
  • Question terms may not exactly match answer
    terms connections between alternations of Q and
    A terms often not documented in flat dictionary
  • Example When was Berlins Brandenburger Tor
    erected? ? no guarantee to match built
  • Recall suffers

68
LCC ApproachWordNet to the rescue!
  • WordNet can be used to inform all three steps of
    the Q/A process
  • 1. Answer-type recognition (Answer Type
    Taxonomy)
  • 2. Passage Retrieval (specificity
    constraints)
  • 3. Answer extraction (recognition of keyword
    alternations)
  • Using WNs lexico-semantic info Examples
  • What kind of flowers did Van Gogh paint?
  • Answer-type recognition need to know (a) answer
    is a kind of flower, and (b) sense of the word
    flower
  • WordNet encodes 470 hyponyms of flower sense 1,
    flowers as plants
  • Nouns from retrieved passages can be searched
    against these hyponyms
  • When was Berlins Brandenburger Tor erected?
  • Semantic alternation erect is a hyponym of
    sense 1 of build

69
WN for Answer Type Recognition
  • Encodes 8707 English concepts to help recognize
    expected answer type
  • Mapping to parts of Wordnet done by hand
  • Can connect to Noun, Adj, and/or Verb
    subhierarchies

70
WN in Passage Retrieval
  • Identify relevant passages from text
  • Extract keywords from the question, and
  • Pass them to the retrieval module
  • Specificity filtering question
    concepts/keywords
  • Focuses search, improves performance and
    precision
  • Question keywords can be omitted from the search
    if they are too general
  • Specificity calculated by counting the hyponyms
    of a given keyword in WordNet
  • Count ignores proper names and same-headed
    concepts
  • Keyword is thrown out if count is above a given
    threshold (currently 10)

71
WN in Answer Extraction
  • If keywords alone cannot find an acceptable
    answer, look for alternations in WordNet!

Q196 Who wrote Hamlet?
Morphological Alternation wrote ? written
Answer before the young playwright has written Hamlet and Shakespeare seizes the opportunity
Q136 Who is the queen of Holland?
Lexical Alternation Holland ? Netherlands
Answer Pricess Margrit, sister of Queen Beatrix of the Netherlands, was also present
Q196 What is the highest mountain in the world?
Semantic Alternation mountain ? peak
Answer first African country to send an expedition to Mount Everest, the worlds highest peak
72
Evaluation
  • Pasca/Harabagiu (NAACL01 Workshop) measured
    approach using TREC-8 and TREC-9 test collections
  • WN contributions to Answer Type Recognition
  • Count number of questions for which acceptable
    answers were found 3GB text collection, 893
    questions

Method questions with correct answer type
All What only
Flat dictionary (baseline) 227 (32) 48 (13)
A-type taxonomy (static) 445 (64) 179 (50)
A-type taxonomy (dynamic) 463 (67) 196 (56)
A-type taxonomy (dynamic answer patterns) 533 (76) 232 (65)
73
Evaluation
  • WN contributions to Passage Retrieval
  • Impact of keyword alternations
  • Impact of specificity knowledge

No alternations enabled 55.3 precision
Lexical alternations enabled 67.6
Lexical semantic alternations enabled 73.7
Morphological expansions enabled 76.5
Specificity knowledge questions with correct answer in first 5 documents returned
TREC-8 TREC-9
Not included 133 (65) 463 (67)
Included 151 (76) 515 (74)
74
Going Beyond Word Matching
  • Use techniques from artificial intelligence to
    try to draw inferences from the meanings of the
    words
  • This is a highly unusual and ambitious approach.
  • Surprising it works at all!
  • Requires huge amounts of hand-coded information
  • Uses notions of proofs and inference from logic
  • All birds fly. Robins are birds. Thus, robins
    fly.
  • forall(X) bird(X) -gt fly(x)
  • forall(X,Y) student(X), enrolled(X,Y) -gt
    school(Y)

75
Inference via a Logic Prover
  • The LCC system attempts inference to justify an
    answer
  • Its inference engine is a kind of funny middle
    ground between logic and pattern matching
  • But quite effective 30 improvement
  • Q When was the internal combustion engine
    invented?
  • A The first internal-combustion engine was built
    in 1867.
  • invent -gt create_mentally -gt create -gt build

76
COGEX
  • World knowledge from
  • WordNet glosses converted to logic forms in the
    eXtended WordNet (XWN) project
  • Lexical chains
  • gamen3 ? HYPERNYM ? recreationn1 ? HYPONYM ?
    sportn1
  • Argentinea1 ? GLOSS ? Argentinan1
  • NLP axioms to handle complex NPs, coordinations,
    appositions, equivalence classes for prepositions
    etcetera
  • Named-entity recognizer
  • John Galt ? HUMAN
  • A relaxation mechanism is used to iteratively
    uncouple predicates, remove terms from LFs. The
    proofs are penalized based on the amount of
    relaxation involved.

77
Logic Inference Example
  • How hot does the inside of an active volcano
    get?
  • get(TEMPERATURE, inside(volcano(active)))
  • lava fragments belched out of the mountain were
    as hot as 300 degrees Fahrenheit
  • fragments(lava, TEMPERATURE(degrees(300)),
  • belched(out, mountain))
  • volcano ISA mountain
  • lava ISPARTOF volcano -gt lava inside volcano
  • fragments of lava HAVEPROPERTIESOF lava
  • The needed semantic information is in WordNet
    definitions, and was successfully translated into
    a form that was used for rough proofs

78
Axiom Creation
  • XWN Axioms
  • A major source of world knowledge is a general
    purpose knowledge base of more than 50,000 parsed
    and disambiguated WordNet glosses that are
    transformed into logical form for use during the
    course of a proof.
  • Gloss
  • Kill is to cause to die
  • Logical Form
  • kill_VB_1(e1,x1,x2) -gt cause_VB_1(e1,x1,x3)
    to_TO(e1,e2) die_VB_1(e2,x2,x4)

79
Lexical Chains
  • Lexical Chains
  • Lexical chains provide an improved source of
    world knowledge by supplying the Logic Prover
    with much needed axioms to link question keywords
    with answer concepts.
  • Question
  • How were biological agents acquired by bin
    Laden?
  • Answer
  • On 8 July 1998 , the Italian newspaper Corriere
    della Serra indicated that members of The World
    Front for Fighting Jews and Crusaders , which was
    founded by Bin Laden , purchased three chemical
    and biological_agent production facilities in
  • Lexical Chain
  • ( v - buy1, purchase1 ) HYPERNYM ( v - get1,
    acquire1 )

80
  • Axiom Selection
  • Lexical chains and the XWN knowledge base work
    together to select and generate the axioms needed
    for a successful proof when all the keywords in
    the questions are not found in the answer.
  • Question
  • How did Adolf Hitler die?
  • Answer
  • Adolf Hitler committed suicide
  • The following Lexical Chain is detected
  • ( n - suicide1, self-destruction1,
    self-annihilation1 ) GLOSS ( v - kill1 ) GLOSS
    ( v - die1, decease1, perish1, go17, exit3,
    pass_away1, expire2, pass25 ) 2
  • The following axioms are loaded into the Prover
  • exists x2 all e1 x1 (suicide_nn(x1) -gt
    act_nn(x1) of_in(x1,e1) kill_vb(e1,x2,x2)).
  • exists x3 x4 all e2 x1 x2 (kill_vb(e2,x1,x2)
    -gt cause_vb_2(e1,x1,x3) to_to(e1,e2)
    die_vb(e2,x2,x4)).

81
LCC System Refecences
  • The previous set of slides drew information from
    these sources
  • The Informative Role of WordNet in Open-Domain
    Question Answering, Pasca and Harabagiu, WordNet
    and Other Lexical Resources, NAACL 2001 Workshop
  • Pasca and Harabagiu, High Performance
    Question/Answering, SIGIR01
  • Moldovan, Clark, Harabagiu, Maiorano COGEX A
    Logic Prover for Question Answering. HLT-NAACL
    2003
  • Moldovan, Pasca, Harabagiu, and Surdeanu
    Performance issues and error analysis in an
    open-domain question answering system. ACM Trans.
    Inf. Syst. 21(2) 133-154 (2003)
  • Harabagiu and Maiorano, Abductive Processes for
    Answer Justification, AAAI Spring Symposium on
    Mining Answers from Texts and Knowledge Bases,
    2002

82
Using Machine Learning in QA
  • The following slides are based on
  • Ramakrishnan, Chakrabarti, Paranjpe,
    Bhattacharyya, Is Question Answering an Acquired
    Skill? WWW04

83
Learning Answer Type Mapping
  • Idea use machine learning techniques to
    automatically determine answer types and query
    terms from questions.
  • Two types of answer types
  • Surface patterns
  • Infinite set, so cant be covered by a lexicon
  • DATES NUMBERS PERSON NAMES LOCATIONS
  • at DDDD in the DDs in DDDD Xx said
  • Can also associate with synset daten7
  • WordNet synsets
  • Consider name an animal that sleeps upright
  • Answer horse

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
84
Determining Answer Types
  • The hard ones are what and which questions.
  • Two useful heuristics
  • If the head of the NP appearing before the
    auxiliary or main verb is not a wh-word, mark
    this as an a-type clue
  • Otherwise, the head of the NP appearing after the
    auxiliary/main verb is an atype clue.

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
85
Learning Answer Types
  • Given a QA pair (q, a)
  • (name an animal that sleeps upright, horse)
  • (1a) See which atype(s) horse can map to
  • (1b) Look up the hypernyms of horse -gt S
  • (2a) Record the k words to the right of the
    q-word
  • (2b) For each of these k words, look up their
    synsets
  • An, animal, that
  • (2c) Increment the counts for those synsets that
    also appear in S
  • Do significance testing
  • Compare synset frequencies against a background
    set
  • Retain only those that are significantly
    associated with the question word more so than in
    general (chi-square)

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
86
Learning Answer Tyeps
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
87
Learning to Choose Query Terms
  • Which words from the question to use in the
    query?
  • A tradeoff between precision and recall.
  • Example
  • Tokyo is the capital of which country?
  • Want to use Tokyo verbatim
  • Probably capital as well
  • But maybe not country maybe nation or maybe
    this word wont appear in the retrieved passage
    at all.
  • Also, country corresponds to the answer type,
    so probably we dont want to require it to be in
    the answer text.

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
88
Learning to Choose Query Terms
  • Features
  • POS assigned to word and immediate neighbors
  • Starts with uppercase letter
  • Is a stopword
  • IDF score
  • Is an answer-type for this question
  • Ambiguity indicators
  • of possible WordNet senses (NumSense)
  • of other WordNet synsets that describe this
    sense
  • E.g., for buck stag, deer, doe
  • (NumLemma)
  • Learner
  • J48 decision tree worked best

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
89
Learning to Choose Query Terms
  • Results
  • WordNet ambiguity indicators were very helpful
  • Raised accuracy from 71-73 to 80
  • Atype flag improved accuracy from 1-3

90
Learning to Score Passages
  • Given a question, and answer, a passage (q, a,
    r)
  • Assign 1 if r contains a
  • Assign 1 otherwise
  • Features
  • Do selected terms s from q appear in r?
  • Does r have an answer zone a that does not s?
  • Are the distances between tokens in a and s
    small?
  • Does a have a strong WordNet similarity with qs
    answer type?
  • Learner
  • Use logistic regression, since it produces a
    ranking rather than a hard classification into 1
    or 1
  • Produces a continuous estimate between 0 and 1

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
91
Learning to Score Passages
  • Results
  • F-scores are low (.33 - .56)
  • However, reranking greatly improves the rank of
    the corresponding passages.
  • Eliminates many non-answers, pushing better
    passages towards the top.

92
Learning to Score Passages
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
93
Computing WordNet Similarity
  • Path-based similarity measures are not all that
    good in WordNet
  • 3 hops from entity to artifact
  • 3 hops from mammal to elephant
  • An alternative
  • Given a target synset t and an answer synset a
  • Measure the overlap of nodes on the path
  • from t to all noun roots and
  • from a to all noun roots
  • Algorithm for computing similarity of t to a
  • If t is not a hypernym of a assign 0
  • Else collect the set of hypernym synsets of t and
    a
  • Call them Ht and Ha
  • Compute the Jaccard overlap
  • Ht Intersect Ha / Ht Union Ha

Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
94
Computing WordNet Similarity
  • Algorithm for computing similarity of t to a
  • Ht Intersect Ha / Ht Union Ha

entity
Ht Intersect Ha
object
living thing
Ht Union Ha
organism
Ht mammal, Ha elephant 7/10 .7 Ht animal,
Ha elephant 5/10 .5 Ht animal, Ha
mammal 4/7 .57 Ht mammal, Ha fox 7/11 .63
animal
chordate
vertebrate
mammal
placental mammal
proboscidean
elephant
Ramakrishnan et al., Is Question Answering an
Acquired Skill? WWW04
95
System Extension Definition Questions
  • Definition questions ask about the definition or
    description of a concept
  • Who is John Galt?
  • What is anorexia nervosa?
  • Many information nuggets are acceptable answers
  • Who is George W. Bush?
  • George W. Bush, the 43rd President of the
    United States
  • George W. Bush defeated Democratic incumbentAnn
    Richards to become the 46th Governor of the State
    of Texas
  • Scoring
  • Any information nugget is acceptable
  • Precision score over all information nuggets

96
Definition Detection with Pattern Matching
Q386 What is anorexia nervosa? cause of anorexia nervosa, an eating disorder...
Q358 What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340 Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
97
Answer Detection with Concept Expansion
  • Enhancement for Definition questions
  • Identify terms that are semantically related to
    the phrase to define
  • Use WordNet hypernyms (more general concepts)

Question WordNet hypernym Detected answer candidate
What is a shaman? priest, non-Christian priest Mathews is the priest or shaman
What is a nematode? worm nematodes, tiny worms in soil
What is anise? herb, herbaceous plant anise, rhubarb and other herbs
98
Online QA Examples
  • Examples (none work very well)
  • AnswerBus
  • http//www.answerbus.com
  • Ionaut
  • http//www.ionaut.com8400/
  • LCC
  • http//www.languagecomputer.com/demos/question_ans
    wering/index.html
Write a Comment
User Comments (0)
About PowerShow.com