Techniques Used in Modern QuestionAnswering Systems - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Techniques Used in Modern QuestionAnswering Systems

Description:

Systems based on semantic representations (Lehnert) ... dragon stole Mary from the castle. John got on top of his. horse and killed the dragon. ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Techniques Used in Modern QuestionAnswering Systems


1
Techniques Used in Modern Question-Answering
Systems
  • Candidacy Exam
  • Elena Filatova
  • December 11, 2002
  • Committee
  • Luis Gravano Columbia University
  • Vasileios Hatzivassiloglou Department of
    Computer Science
  • Rebecca J. Passonneau

2
Present vs Past Research on QA
  • Current systems
  • Mainly systems written for TREC conference
  • factoid questions
  • short answers
  • huge text collections
  • Related systems
  • IR
  • queries vs questions
  • return documents vs short answers
  • Systems based on semantic representations
    (Lehnert)
  • questions about one text vs text collections
  • inference from semantic structure of a text vs
    searching for an answer in the text
  • One type of output (NP) from a closed collection
    (Kupiec)
  • answer inference vs answer extraction

3
Lehnert system
  • John loved Mary but she didnt want to marry him.
    One day, a
  • dragon stole Mary from the castle. John got on
    top of his
  • horse and killed the dragon. Mary agreed to marry
    him. They
  • lived happily ever after.
  • Q Why did Mary agree to marry John?
  • A Because she was indebted to him
  • Problems stated
  • right classification
  • dependency of answer inference procedure on the
    type of the question

4
Current QA Systems
list of answers
extracted documents
rules for answer
question analysis
question
query
  • right query
  • long text
  • domain dependency
  • predefined types of answers

5
Plan
  • Classification
  • Information (document) retrieval
  • Query formation
  • Information extraction
  • Passage extraction
  • Answer extraction
  • Usage of answer redundancy on Web in QA
  • QA for restricted domain
  • Evaluation procedure for current QA systems and
    analysis of the performance

6
Classification and QA
list of answers
extracted documents
rules for answer
question analysis
question
query
7
Theory of Classification
  • Rosch et al classification of basic objects
  • World is structured real-world attributes do not
    occur independently of each other
  • object_has(wings) gt P(object_has(feathers)) gt
    P(object_has(fur))
  • Each category (class) set of attributes that
    are common for all the objects in the category
  • Types of categories
  • Superordinate small amount of common attributes
    (furniture)
  • Subordinate a lot of common attributes (floor
    lamp, desk lamp)
  • Basic optimal amount of common attributes
    (lamp) basic objects are the most inclusive
    categories which delineate the correlation
    structure of the environment
  • Though classification is a converging problem
    for objects, it is not possible to compile a
    list of all possible basic categories.

8
QA classification.
  • Hierarchical/nonhierarchical classification
  • Even if there exist hierarchy in the
    classification it can be represented as flat
    detailed classes other class
  • Amount of types
  • (MULDER 3 types vs Webclopedia over 140
    types)
  • Trade off between
  • Detailed classes for better answer extraction and
  • High precision in defining the classes
  • Usage of semantics
  • Usage of syntax
  • Most of syntactic parsers are built on corpora
    which do no contain a lot of questions (WSJ) gt
    need of additional corpus
  • Attempts to automate this process
  • Maximum Entropy (Ittycheriah)
  • Classifiers (LiRoth)

9
Why QA classification is important?
  • Usage of question type for
  • query construction
  • question keywords filtering mechanism
    (Harabagiu)
  • synonyms and syn.sets from WordNet (Webclopedia)
  • in both cases there is no connection with
    possible answer space
  • information retrieval (Agichtein, Berger)
  • there is connection between question and answer
    spaces
  • but these types do not give the type of the
    answer
  • 2. searching for a correct answer in the passage
    extracted from a text

10
Logical Forms
  • Syntactic analysis plus semantic gt logical form
  • Mapping of question and potential answer LFs to
    find the best match (Harabagiu, Webclopedia)

11
Query formation
  • WordNet synonyms, hyponyms, etc.
  • Morphology verbal forms, plural/single nouns,
    etc.
  • Knowledge of the domain (IBMs system)
  • Statistical methods for connecting question and
    answer spaces
  • Agichtein automatic acquisition of patterns that
    might be good candidates for query expansion
  • 4 types of question
  • Berger to facilitate query modification
    (expansion) each question term gets a set of
    answer terms
  • FQA closed set of question-answer pairs

12
Information retrieval
  • Classical IR is the first step of QA
  • Vector-space model (calculation of similarity
    between terms in the query and terms in the
    document)
  • IR techniques used in current QA systems are
    usually for one database (either web or TREC
    collection)
  • Is it possible to apply Distributed IR
    techniques?
  • domain restricted QA with extra knowledge about
    the text collection
  • IBM system
  • splitting one big collection of documents into
    smaller collections about specific topics
  • it might require change in classification type
    of the question might cause the changes in query
    formulation, document extraction process, answer
    extraction process

13
list of answers
extracted documents
rules for answer
question analysis
question
query
14
Passage extraction
  • Passages of particular length (Cardie) Vector
    representation for each passage
  • Paragraphs or sentences
  • Classical text excerpting
  • Each sentence is assigned a score
  • Retrieved passages are formed by taking the
    sentences with the highest score
  • Global-Local Processing (Salton)
  • McCallum passage extraction based not only on
    words but also on other features (e.g. syntactic
    constructions)

15
Information Extraction
  • Domain dependency (Grishman)
  • predefined set of attributes for the search
    specific for each
  • topic, e.g. terrorism victims, locations,
    perpetrators
  • usually a lot of manually tagged data for
    training
  • or
  • texts divided into two groups one topic all
    other texts (Riloff)
  • in both cases division into topics is a
  • necessary step which is not applicable to open
    domain QA systems

16
What information can be extracted (IE)
  • Named entities (NE-tagging)
  • Numbers (incl. dates, ZIP codes, etc.)
  • Proper names (locations, people, etc.)
  • Other depending on the system
  • TREC8 80 questions asked for NEs
  • NEs might also support
  • Correlated entity mini-CV (Srihari)
  • Who is Julian Hill?
  • name age gender position affiliation
    education
  • General events (Srihari)
  • Who did what to whom when
  • More complicated IE techniques lead QA back to AI
    approach

17
Answer Extraction
  • Three main techniques for answer extraction are
    based on
  • syntactic-semantic tree dependencies (Harabagiu,
    Webclopedia)
  • LF of the question is mapped to LF of possible
    answers
  • surface patterns (Webclopedia)
  • ltNamegt (ltAnswergt -)
  • ltNamegt was born on ltAnswergt
  • Good patterns require detailed classification
    NUMBER vs DOB
  • text window
  • Cardie query-dependant text summarization of
    text passages with/without syntactic and semantic
    information

LF mapping classical MT surface patterns
example-based MT text window statistical MT
18
Usage of Web (Answer redundancy)
  • Multiple formulation of answer can useful for
  • IR stage increased chances to find an answer
    that matches query (Clarke, Brill)
  • no need in searching for an exact formulation of
    the answer
  • 2. IE stage facilitation of answer extraction
    (Agichtein, Ravichandran, Brill)
  • create a list of patterns which might contain the
    answer
  • either completely automatic (Agichtein) or using
    handwritten
  • filters based on question types and domain
    (Brill)
  • Answer validation (Magnini)
  • correct answer redundancy

19
Domain restricted applications
  • FAQ (different from IR or QA)
  • match the input question with a list of already
    existing questions
  • predefined output (according to the above
    question matching)
  • Rillof
  • 5 types of questions
  • answer extraction from a given text gt no IR
    stage
  • always there is an answer (unique answer)
  • IBM system
  • based on good knowledge of inner structure of IBM
    web-site
  • Use of FAQ techniques
  • results are better than for open-domain QA systems

restricted-domain MT vs open-domain MT
20
Evaluation
  • IR and IE have different evaluation measures
  • IR each document is marked either
    relevant/non-relevant ? recall precision
  • IE gold standard answer key enumerates all
    acceptable responses ? recall precision
  • QA mean reciprocal rank (MRR) ?
  • For each questionreceive score equal to
    reciprocal of rank of first correct response, or
    0 if no correct response found.
  • Overall system score is mean of individual
    question scores.

N amount of questions asked Ki rank of the
correct answer or 0 RAR 1/ Ki
21
Future of QA
FROM
TO
Questions Complex Uses Judgments Terms
Knowledge of User Context Needed
Questions Simple facts
Answers Search Mult. Sources Fusion of Info
Resolution of Conflicting Data Interpretations,
Conclusions
Answers Simple Factoid Answers found in Single
Document
Write a Comment
User Comments (0)
About PowerShow.com