Stuff to Add - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Stuff to Add

Description:

Lucian Lita. Jeongwoo Ko. Scott Judy. Frank Lin. Curtis. Huttenhower. Jeffrey Micher. Kevyn Collins ... Create a general QA planning system. How should a QA ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 50
Provided by: ericn1
Category:
Tags: add | lita | stuff

less

Transcript and Presenter's Notes

Title: Stuff to Add


1
Language Technologies InstituteCarnegie Mellon
University
AQUAINT 24-Month WorkshopDecember 2, 2003
2
JAVELIN Team
3
Outline
  • Goals of the Research
  • Overall Accomplishments
  • Summary of the Approach
  • Testing and Evaluation
  • Remaining Challenges
  • Future Work

4
Goals of the Research
  • QA as Planning
  • Create a general QA planning system
  • How should a QA system represent its chain of
    reasoning?
  • QA and Auditability
  • How can we improve a QA systems ability to
    justify its steps?
  • How can we make QA systems open to machine
    learning?

5
Goals 2
  • Utility-Based Information Fusion
  • Perceived utility is a function of many different
    factors
  • Create and tune utility metrics
  • Architectures for QA
  • Distributed development
  • Modular integration (mix match components)
  • Loose integration via shared data standards
  • Impact of multilingual QA on design
  • Long-term memory

6
Overall Accomplishments
  • First end-to-end QA project at CMU
  • Completed system operational w/limited question
    type coverage
  • Established distributed modular architecture for
    plug-and-play of individual QA components
  • Established automatic testing framework
  • Created a long-term repository of questions,
    answers and intermediate results (documents,
    passages, etc.)
  • Created a web-based browser for answer
    justification structures
  • Multi-strategy approach with dynamic planning

7
Overall Accomplishments 2
  • Implemented several approaches for answer
    extraction (pattern-based, statistical, and
    NLP-based)
  • Participated in TREC 11 12 QA track evaluations
  • Co-organized (with IBM) and participated in
    Relationship Pilot evaluation
  • Preliminary system delivered to MITRE testbed
  • Extended architecture for multilingual document
    bases (includes incorporation of Japanese and
    Chinese text processing tools)
  • Graphical user interface with support for user
    clarification

8
Summary of the Approach
  • Basic Steps (Modules) in the QA Process
  • Question Analyzer (QA)
  • Retrieval Strategist (RS)
  • Information Extractor (IX)
  • Answer Generator (AG)
  • Overall Architecture / Integration
  • Graphical User Interface (GUI)
  • Planner
  • Execution Manager (EM)
  • Repository
  • Answer Justification (AJ)
  • Extensions to the Architecture
  • NLP for Information Extraction
  • Multilingual document sources

9
Question Analyzer (QA)
Question input (XML format)
Wordnet KANTOO Lexicon
Brill Tagger BBN Identifier KANTOO Lexifier
Tokenizer Token information extractionPhrase
Chunking
  • Combines pattern-matching and NLP
  • Request Object
  • Question, question type, answer type
  • Keywords, alternate forms (abbrevs, translations)
  • Syntactic analysis(f-structure)
  • Semantic analysis (logical representation)

Annotated Token List
Type Taxonomies Type-Specific Constraints
KANTOO grammars
Parser
F-structure
Request Object Builder
Extraction Patterns Heuristics
Request Object (XML format)
10
Question Analyzer Performanceon TREC 11 Questions
11
Retrieval Strategist (RS)
  • Input
  • Keyword list (from Request Object)
  • Max. number of documents
  • Collection(s) to search
  • Each keyword is assigned a priority (1-5)
  • Likelihood that a keyword will appear in an
    answer passage
  • Start with highly constrained search
  • All keywords, in close proximity
  • Iterate while more documents needed
  • Retrieve documents
  • Relax query by one step (up to 15 steps)
    relax keywords, proximity window
  • Hybrid approach start with structured queries,
    switch to tf.idf (combination works better than
    either alone)
  • Output ranked document result list

12
Retrieval Strategist Current Work
  • Retrieval based on Lemur 2.0 toolkit
  • Multiple retrieval models, very flexible
  • RS previously used Inquery
  • Uses structured query support from UMass
  • Extending for use with Chinese, Japanese
  • Distributed search (via Lemur)
  • Support for querying multiple QA resources
  • CORI collection selection algorithm

13
Information Extractor (IX)
  • Input
  • Question (Request Object from QA)
  • Set of relevant documents (from RS)
  • Output
  • Set of potentially useful extracted answers
  • Corresponding passages
  • Confidence scores
  • Role in JAVELIN Extract candidate answers
    passages from documents

14
Information Extractor Features
  • Self-contained algorithms that score passages in
    different ways
  • Example Simple Features
  • Keywords present
  • Normalized window size
  • Average distance
  • Verbs encompassed Answer,Main Verb
  • Proper nouns phrases present
  • Example Pattern Features
  • cN .. cV .. in/on date
  • date, iN .. cV ..
  • Any procedure that returns a numeric value is a
    valid feature

15
Answer Confidence Learning
  • Supervised learning
  • Model the probability of correctness given a
    question q, a passage p, and an answer a from the
    passage
  • p(cq,a,p) ? Model(f1(q,a,p), f2(q,a,p) ..
    fn(q,a,p))
  • where fi are features computed from q, a, and
    p
  • Supervised models
  • K-Nearest Neighbors (KNN)
  • Decision Tree (DT)
  • Support Vector Machine (SVM)
  • Finite State Transducers (FST)

16
Information Extractor Steps
  • Filter passages
  • Match answer type?
  • Contain sufficient keywords?
  • Create variations on passages
  • POS tagging (Brill)
  • Cleansing (punctuation, tags, etc.)
  • Expand contractions
  • Reduce surface forms to lexemes
  • Calculate feature values
  • A classifier scores the passages, which are
    output with confidence scores

17
Answer Generator (AG)
  • Input answer candidates, source passages
  • Output ranked answers, or requests for more
    information passed back to Planner
  • Not enough answer candidates
  • Cant distinguish answer candidates
  • Main tasks
  • Combination of different sorts of evidence for
    answer verification.
  • Detection and combination of similar answer
    candidates to address answer granularity.
  • Answer type checking to filter out improper
    answers.
  • Generation of answers in required format.

18
Answer Normalization
  • Request Filler/Answer Generator aware of NE
    types dates, times, people names, company names,
    locations, currency expressions.
  • April 14th, 1912, 14th of April 1912, 14
    April 1912 instances of same date, but different
    strings.
  • For date expressions, normalization performed to
    ISO 8601 (YYYY-MM-DD) in Answer Generator.
  • summer, last year, etc. remain as strings.

19
Answer Type Checking
  • Motivation
  • Errors in earlier modules or ambiguous
    information in the document can generate improper
    answer candidates.
  • Not all the answer candidates from IX are the
    potential answers.
  • Validate answer candidates by checking how
    adequate each answer is with respect to the
    answer type.
  • Current approaches
  • Use WordNet
  • Use Gazetteer for location questions
  • Use Google for object questions
  • Use internal patterns for numeric and date
    questions

20
SystemArchitecture
Answer Justification
Web Browser
Domain Model
Data Repository
process history data
JAVELIN operator (action) models
Question Analyzer
Planner
JAVELIN GUI
Execution Manager
FST Extractor
Retrieval Strategist
KNN Extractor
Light Extractor
Information Extractors
...
SVM Extractor
Answer Generator
NLP Extractor
21
Graphical User Interface (GUI)
22
GUI/Planner Interaction
GUI
Planner
  • QUESTION XML containing
  • question text
  • planner settings
  • PAUSE
  • RESUME
  • QUIT (end session)
  • STOP (abort question)
  • ANSWER XML containing
  • answers in rank order
  • confidence scores
  • repository IDs
  • OK
  • ERROR description

GUI-Initiated
  • DIALOG XML containing
  • type of dialog (yes/no, multiple choice, text)
  • question to ask user
  • default response
  • choices to display (when applicable)
  • RESPONSE text containing
  • yes or no
  • text of selected choice
  • reply text

Planner-Initiated
23
Motivation for Planning
  • Enable run-time generation of new
    question-answering strategies
  • Improve ability to recover from bad decisions as
    information is collected
  • Gain insight into when different QA components
    are most useful

24
JAVELIN Planning Approach
  • Reasoning at a level above syntactic and lexical
    details of individual requests
  • QA process steps - planning domain operators
  • information consumed/produced by the system -
    planning state
  • Explicit models of state and action uncertainty
  • Utility-based forward-chaining planning algorithm
  • Choose actions with maximum expected utility of
    information
  • Interleave planning and execution control of
    JAVELIN QA components to manage information
    uncertainty

25
System Architecture
Domain Model
Data Repository
Planner
JAVELIN GUI
Execution Manager
S0
Algorithm runs until goal is satisfied or failure
conditions are met
...
26
Role of the Execution Manger
  • Coordinates communication between Planner and
    other question-answering components
  • Supports session architecture by storing all
    planning steps and processing data in the
    Repository
  • Simplifies integration of new modules
  • Provides centralized Repository access
  • Authenticates users for GUI
  • Runs batch end-to-end pipeline system tests

27
Sequence with Interactivity Enabled
Q Where is bile produced? A 1. liver
(0.99175) 2. tube (0.83664) 3. doctors
(0.81202) 4. operation (0.81031) 5. Guangdong
Province (0.78025) 136 additional answers
28
Sequence with Interactivity Enabled
In comparison with non-interactive mode
Q Where is bile produced? A 1. China
(0.96944) 2. Moscow (0.75011) 3. Cambridge
(0.75011) 4. Guangdong Province (0.60531) 5.
Chinese (0.49776) 4 additional answers
300 DS5597 300 RO6180 DS5597 300 FS15985 SVM 300 Q17262 RANKED
time 51 sec
and intermediate results produced by the
interactive mode...
FS15952, AL5445 (SVM) 1 drug (0.73359) 2 liver
(0.6497) 3 acid sequestrants (0.49766) 4
LDL-cholesterol (0.47154) 5 rheumatoid arthritis
(0.47154) 12 additional answers
FS15935, FS15957 (FST) No answer
found FS15939, AL5440 (SVM) (same as
non-interactive mode above)
FS15962, AL5446 (Light) 1 Moscow (0.25) 2
Cambridge (0.25) 3 Dallas (0.01282) 4 China
(0.01259)
29
Javelin Repository
  • The repository stores all the decisions made by
    the Planner and information produced by the
    modules in a persistent database
  • Permits a detailed trace of the systems
    operation (a move toward answer justification)

30
Repository ERD
Request Object
Planner Objects
AnswerObjects
31
(No Transcript)
32
Adding Shallow Semantics to JAVELIN NLP IX
  • Answer extraction module that makes use of
    natural language processing capabilities
  • Currently depends on shallow, broad-coverage
    parsing
  • Similarity-based unification strategy
  • Incorporates a general framework for text
    processing plug-ins

33
Basic Idea
partial interpretation
Unification on simple predicatesrepresenting
basic argumentstructure will provide a
moreaccurate way to match questionswith
appropriate answer(s)
Two Challenges Where do predicates come
from? Flexibility in interpretation
34
Comparing LR for Question and Answer Passages
35
Text Processor (TP)
  • Complex question analysis requires many types of
    language processing
  • Simple Tokenization, POS tagging
  • Harder Synonym expansion, syntax
  • Hardest Semantic frames, temporal info
  • Collect all of these services into a single
    module
  • CLAWS POS tagger
  • RASP syntactic parser
  • Link grammar parser
  • WordNet synsets
  • FrameNet semantic frames
  • BBN Identifinder NE tagging

36
Linguistic Reasoning about Domain Content
  • More complex questions require more complex
    reasoning.
  • Joe has access to weapons-grade anthrax.
  • Joe is thought to possess warheads capable of
    delivering biological agents.
  • Is Joe capable of mounting a biological attack?
  • Requires inference over information drawn from
    multiple documents

Planner reasoning about the QA ProcessFLOOD
reasoning about domain content
37
FLOOD Reasoner
  • FLOOD is an environment for developing reasoners
    for complex question analysis
  • Consumes semantic frame information from the text
    processor
  • Provides a planning platform for rule
    specification
  • Allows complex operations such as subqueries, etc.

38
Pronoun Resolution
  • Same sentence
  • This article also states that intelligence
    sources world-wide have been on a "manhunt" the
    last several weeks for bin Ladin due to reports
    that he had purchased nuclear weapons.
  • Previous sentence
  • There are still constraints on Saddams power.
    His economic infrastructure is in long-term
    decline, and his ability to project power outside
    Iraqs borders is severely limited, largely
    because of the effectiveness and enforcement of
    the No-Fly Zones.
  • Intervening discourse
  • Iraq is forging ahead with its outlawed
    chemical, nuclear and germ weapon programs as
    well as with the development of missiles to
    deliver them, Defense Secretary Donald Rumsfeld
    said on Friday. Saddam Hussein 's appetites for
    these weapons is enormous, he said in an
    interview with the Fox News Channel.

39
General Process
  • Parse the retrieved text
  • morphology (POS, stem)
  • lexical information (NE tagger, WordNet)
  • syntactic structure (RASP)
  • grammatical functions (Link)
  • Assign agreement features gender, person,
    number, animacy
  • Select possible antecedents (NPs agreeing with
    the pronoun)
  • Prune candidates according to
  • Known linguistic principles, where applicable
  • Heuristics (from Mitamura et al., 2003)

40
Multilingual JAVELINArchitecture
?s
Ongoing/Future Work
Chinese Request Object (UTF-8)
Chinese Answers (UTF-8)
Chinese IX module collection
Answer Generator
Question Analyzer
Japanese Answers (UTF-8)
Japanese Request Object (UTF-8)
Multilingual Retrieval Strategist (UTF-8)
Japanese IX module collection
Answer Generator
Multilingual Question Object (UTF-8)
Encoding Conversion
Answer Generator
English IX module collection
English Request Object (UTF-8)
English Answer (UTF-8)
English Answers (UTF-8)
Chinese Corpora (GBK)
English Corpora (ASCII)
Japanese Corpora (EUC-JP)
41
Japanese Language Resources
  • Mainichi Shimbun Corpus
  • Full corpus of a major Japanese newspaper for
    1998 and 1999 (About 240,000 articles)
  • Bilingual Dictionaries
  • EDICT (100,000 general entries, 200,000 Japanese
    personal names, 87,000 Japanese place names,
    14,000 scientific terms)
  • EIJIRO (English word to Japanese phrase harder
    to use, but has 1,080,000 entries)
  • Web-based Machine Translation
  • Systran
  • Amikai
  • Named Entity Tagger / Dependency Structure
    Analyzer
  • Cabocha
  • POS-tagger
  • Chasen

42
Chinese Language Resources
  • Corpora
  • Xinhua News corpora (in use)
  • Xinhua News from 1991-2001
  • NTCIR-3 CLIR IR/CLIR Test Collection (future)
  • Chinese news articles publish in Taiwan in
    1998-1999
  • Foreign Broadcast Information Service (future)
  • Mandarin-English parallel corpora
  • Preprocessing (tools from RADD-MT project)
  • ASCII character and digit normalization
  • Segmentation
  • Name entity tagging
  • Bilingual Dictionaries
  • LDC
  • Bilingual word-to-word dictionary
  • Bilingual phrase-to-phrase dictionary
  • CEDICT
  • Chinese-English dictionary

43
Testing and Evaluation
  • Daily test framework reporting
  • Evaluations
  • TREC 11 QA Track evaluation
  • Relationship Pilot evaluation
  • TREC 12 QA Track evaluation

Details available from NIST web site or the
JAVELIN home pagehttp//www.lti.cs.cmu.edu/Rese
arch/JAVELIN
44
Evaluation Techniques
  • Execution Manager can run in lights out batch
    mode
  • Regular tests on different test suites (TREC
    question suites, relationship pilot questions,
    etc.)
  • Results include scores and logs for debugging
  • All intermediate results are stored in Repository

45
Sample Results
46
Sample Log File Excerpt
47
Remaining Challenges
  • Getting adequate training data for statistical
    approaches
  • Getting adequate lexico-semantic resources for
    NLP approaches
  • Combining existing NLP tools into an integrated
    framework
  • Extending the data model and representations for
    scenario-based QA

48
Future Work
  • Variable-Precision Knowledge Representation and
    Reasoning
  • Scenario-Driven Dialogs
  • Scenario Representation
  • Multilingual, Distributed IR
  • Multi-Strategy Information Gathering
  • Answer Visualization and Scenario Refinement

49
Questions?
Write a Comment
User Comments (0)
About PowerShow.com