Techniques Used in Modern QuestionAnswering Systems - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Techniques Used in Modern QuestionAnswering Systems

Description:

Systems based on semantic representations (Lehnert) ... dragon stole Mary from the castle. John got on top of his. horse and killed the dragon. ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 22

Provided by: ElenaAF7

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Techniques Used in Modern QuestionAnswering Systems

1
Techniques Used in Modern Question-Answering
Systems

Candidacy Exam
Elena Filatova
December 11, 2002
Committee
Luis Gravano Columbia University
Vasileios Hatzivassiloglou Department of
Computer Science
Rebecca J. Passonneau

2
Present vs Past Research on QA

Current systems
Mainly systems written for TREC conference
factoid questions
short answers
huge text collections
Related systems
IR
queries vs questions
return documents vs short answers
Systems based on semantic representations
(Lehnert)
questions about one text vs text collections
inference from semantic structure of a text vs
searching for an answer in the text
One type of output (NP) from a closed collection
(Kupiec)
answer inference vs answer extraction

3
Lehnert system

John loved Mary but she didnt want to marry him.
One day, a
dragon stole Mary from the castle. John got on
top of his
horse and killed the dragon. Mary agreed to marry
him. They
lived happily ever after.
Q Why did Mary agree to marry John?
A Because she was indebted to him
Problems stated
right classification
dependency of answer inference procedure on the
type of the question

4
Current QA Systems
list of answers
extracted documents
rules for answer
question analysis
question
query

right query
long text

domain dependency
predefined types of answers

5
Plan

Classification
Information (document) retrieval
Query formation
Information extraction
Passage extraction
Answer extraction
Usage of answer redundancy on Web in QA
QA for restricted domain
Evaluation procedure for current QA systems and
analysis of the performance

6
Classification and QA
list of answers
extracted documents
rules for answer
question analysis
question
query
7
Theory of Classification

Rosch et al classification of basic objects
World is structured real-world attributes do not
occur independently of each other
object_has(wings) gt P(object_has(feathers)) gt
P(object_has(fur))
Each category (class) set of attributes that
are common for all the objects in the category
Types of categories
Superordinate small amount of common attributes
(furniture)
Subordinate a lot of common attributes (floor
lamp, desk lamp)
Basic optimal amount of common attributes
(lamp) basic objects are the most inclusive
categories which delineate the correlation
structure of the environment
Though classification is a converging problem
for objects, it is not possible to compile a
list of all possible basic categories.

8
QA classification.

Hierarchical/nonhierarchical classification
Even if there exist hierarchy in the
classification it can be represented as flat
detailed classes other class
Amount of types
(MULDER 3 types vs Webclopedia over 140
types)
Trade off between
Detailed classes for better answer extraction and
High precision in defining the classes
Usage of semantics
Usage of syntax
Most of syntactic parsers are built on corpora
which do no contain a lot of questions (WSJ) gt
need of additional corpus
Attempts to automate this process
Maximum Entropy (Ittycheriah)
Classifiers (LiRoth)

9
Why QA classification is important?

Usage of question type for
query construction
question keywords filtering mechanism
(Harabagiu)
synonyms and syn.sets from WordNet (Webclopedia)
in both cases there is no connection with
possible answer space
information retrieval (Agichtein, Berger)
there is connection between question and answer
spaces
but these types do not give the type of the
answer
2. searching for a correct answer in the passage
extracted from a text

10
Logical Forms

Syntactic analysis plus semantic gt logical form
Mapping of question and potential answer LFs to
find the best match (Harabagiu, Webclopedia)

11
Query formation

WordNet synonyms, hyponyms, etc.
Morphology verbal forms, plural/single nouns,
etc.
Knowledge of the domain (IBMs system)
Statistical methods for connecting question and
answer spaces
Agichtein automatic acquisition of patterns that
might be good candidates for query expansion
4 types of question
Berger to facilitate query modification
(expansion) each question term gets a set of
answer terms
FQA closed set of question-answer pairs

12
Information retrieval

Classical IR is the first step of QA
Vector-space model (calculation of similarity
between terms in the query and terms in the
document)
IR techniques used in current QA systems are
usually for one database (either web or TREC
collection)
Is it possible to apply Distributed IR
techniques?
domain restricted QA with extra knowledge about
the text collection
IBM system
splitting one big collection of documents into
smaller collections about specific topics
it might require change in classification type
of the question might cause the changes in query
formulation, document extraction process, answer
extraction process

13
list of answers
extracted documents
rules for answer
question analysis
question
query
14
Passage extraction

Passages of particular length (Cardie) Vector
representation for each passage
Paragraphs or sentences
Classical text excerpting
Each sentence is assigned a score
Retrieved passages are formed by taking the
sentences with the highest score
Global-Local Processing (Salton)
McCallum passage extraction based not only on
words but also on other features (e.g. syntactic
constructions)

15
Information Extraction

Domain dependency (Grishman)
predefined set of attributes for the search
specific for each
topic, e.g. terrorism victims, locations,
perpetrators
usually a lot of manually tagged data for
training
or
texts divided into two groups one topic all
other texts (Riloff)
in both cases division into topics is a
necessary step which is not applicable to open
domain QA systems

16
What information can be extracted (IE)

Named entities (NE-tagging)
Numbers (incl. dates, ZIP codes, etc.)
Proper names (locations, people, etc.)
Other depending on the system
TREC8 80 questions asked for NEs
NEs might also support
Correlated entity mini-CV (Srihari)
Who is Julian Hill?
name age gender position affiliation
education
General events (Srihari)
Who did what to whom when
More complicated IE techniques lead QA back to AI
approach

17
Answer Extraction

Three main techniques for answer extraction are
based on
syntactic-semantic tree dependencies (Harabagiu,
Webclopedia)
LF of the question is mapped to LF of possible
answers
surface patterns (Webclopedia)
ltNamegt (ltAnswergt -)
ltNamegt was born on ltAnswergt
Good patterns require detailed classification
NUMBER vs DOB
text window
Cardie query-dependant text summarization of
text passages with/without syntactic and semantic
information

LF mapping classical MT surface patterns
example-based MT text window statistical MT
18
Usage of Web (Answer redundancy)

Multiple formulation of answer can useful for
IR stage increased chances to find an answer
that matches query (Clarke, Brill)
no need in searching for an exact formulation of
the answer
2. IE stage facilitation of answer extraction
(Agichtein, Ravichandran, Brill)
create a list of patterns which might contain the
answer
either completely automatic (Agichtein) or using
handwritten
filters based on question types and domain
(Brill)
Answer validation (Magnini)
correct answer redundancy

19
Domain restricted applications

FAQ (different from IR or QA)
match the input question with a list of already
existing questions
predefined output (according to the above
question matching)
Rillof
5 types of questions
answer extraction from a given text gt no IR
stage
always there is an answer (unique answer)
IBM system
based on good knowledge of inner structure of IBM
web-site
Use of FAQ techniques
results are better than for open-domain QA systems

restricted-domain MT vs open-domain MT
20
Evaluation

IR and IE have different evaluation measures
IR each document is marked either
relevant/non-relevant ? recall precision
IE gold standard answer key enumerates all
acceptable responses ? recall precision
QA mean reciprocal rank (MRR) ?
For each questionreceive score equal to
reciprocal of rank of first correct response, or
0 if no correct response found.
Overall system score is mean of individual
question scores.

N amount of questions asked Ki rank of the
correct answer or 0 RAR 1/ Ki
21
Future of QA
FROM
TO
Questions Complex Uses Judgments Terms
Knowledge of User Context Needed
Questions Simple facts
Answers Search Mult. Sources Fusion of Info
Resolution of Conflicting Data Interpretations,
Conclusions
Answers Simple Factoid Answers found in Single
Document

Write a Comment

User Comments (0)