Title: Stuff to Add
1Language Technologies InstituteCarnegie Mellon
University
AQUAINT 24-Month WorkshopDecember 2, 2003
2JAVELIN Team
3Outline
- Goals of the Research
- Overall Accomplishments
- Summary of the Approach
- Testing and Evaluation
- Remaining Challenges
- Future Work
4Goals of the Research
- QA as Planning
- Create a general QA planning system
- How should a QA system represent its chain of
reasoning? - QA and Auditability
- How can we improve a QA systems ability to
justify its steps? - How can we make QA systems open to machine
learning?
5Goals 2
- Utility-Based Information Fusion
- Perceived utility is a function of many different
factors - Create and tune utility metrics
- Architectures for QA
- Distributed development
- Modular integration (mix match components)
- Loose integration via shared data standards
- Impact of multilingual QA on design
- Long-term memory
6Overall Accomplishments
- First end-to-end QA project at CMU
- Completed system operational w/limited question
type coverage - Established distributed modular architecture for
plug-and-play of individual QA components - Established automatic testing framework
- Created a long-term repository of questions,
answers and intermediate results (documents,
passages, etc.) - Created a web-based browser for answer
justification structures - Multi-strategy approach with dynamic planning
7Overall Accomplishments 2
- Implemented several approaches for answer
extraction (pattern-based, statistical, and
NLP-based) - Participated in TREC 11 12 QA track evaluations
- Co-organized (with IBM) and participated in
Relationship Pilot evaluation - Preliminary system delivered to MITRE testbed
- Extended architecture for multilingual document
bases (includes incorporation of Japanese and
Chinese text processing tools) - Graphical user interface with support for user
clarification
8Summary of the Approach
- Basic Steps (Modules) in the QA Process
- Question Analyzer (QA)
- Retrieval Strategist (RS)
- Information Extractor (IX)
- Answer Generator (AG)
- Overall Architecture / Integration
- Graphical User Interface (GUI)
- Planner
- Execution Manager (EM)
- Repository
- Answer Justification (AJ)
- Extensions to the Architecture
- NLP for Information Extraction
- Multilingual document sources
9Question Analyzer (QA)
Question input (XML format)
Wordnet KANTOO Lexicon
Brill Tagger BBN Identifier KANTOO Lexifier
Tokenizer Token information extractionPhrase
Chunking
- Combines pattern-matching and NLP
- Request Object
- Question, question type, answer type
- Keywords, alternate forms (abbrevs, translations)
- Syntactic analysis(f-structure)
- Semantic analysis (logical representation)
Annotated Token List
Type Taxonomies Type-Specific Constraints
KANTOO grammars
Parser
F-structure
Request Object Builder
Extraction Patterns Heuristics
Request Object (XML format)
10Question Analyzer Performanceon TREC 11 Questions
11Retrieval Strategist (RS)
- Input
- Keyword list (from Request Object)
- Max. number of documents
- Collection(s) to search
- Each keyword is assigned a priority (1-5)
- Likelihood that a keyword will appear in an
answer passage - Start with highly constrained search
- All keywords, in close proximity
- Iterate while more documents needed
- Retrieve documents
- Relax query by one step (up to 15 steps)
relax keywords, proximity window - Hybrid approach start with structured queries,
switch to tf.idf (combination works better than
either alone) - Output ranked document result list
12Retrieval Strategist Current Work
- Retrieval based on Lemur 2.0 toolkit
- Multiple retrieval models, very flexible
- RS previously used Inquery
- Uses structured query support from UMass
- Extending for use with Chinese, Japanese
- Distributed search (via Lemur)
- Support for querying multiple QA resources
- CORI collection selection algorithm
13Information Extractor (IX)
- Input
- Question (Request Object from QA)
- Set of relevant documents (from RS)
- Output
- Set of potentially useful extracted answers
- Corresponding passages
- Confidence scores
- Role in JAVELIN Extract candidate answers
passages from documents
14Information Extractor Features
- Self-contained algorithms that score passages in
different ways - Example Simple Features
- Keywords present
- Normalized window size
- Average distance
- Verbs encompassed Answer,Main Verb
- Proper nouns phrases present
- Example Pattern Features
- cN .. cV .. in/on date
- date, iN .. cV ..
- Any procedure that returns a numeric value is a
valid feature
15Answer Confidence Learning
- Supervised learning
- Model the probability of correctness given a
question q, a passage p, and an answer a from the
passage -
- p(cq,a,p) ? Model(f1(q,a,p), f2(q,a,p) ..
fn(q,a,p)) - where fi are features computed from q, a, and
p - Supervised models
- K-Nearest Neighbors (KNN)
- Decision Tree (DT)
- Support Vector Machine (SVM)
- Finite State Transducers (FST)
16Information Extractor Steps
- Filter passages
- Match answer type?
- Contain sufficient keywords?
- Create variations on passages
- POS tagging (Brill)
- Cleansing (punctuation, tags, etc.)
- Expand contractions
- Reduce surface forms to lexemes
- Calculate feature values
- A classifier scores the passages, which are
output with confidence scores
17Answer Generator (AG)
- Input answer candidates, source passages
- Output ranked answers, or requests for more
information passed back to Planner - Not enough answer candidates
- Cant distinguish answer candidates
- Main tasks
- Combination of different sorts of evidence for
answer verification. - Detection and combination of similar answer
candidates to address answer granularity. - Answer type checking to filter out improper
answers. - Generation of answers in required format.
18Answer Normalization
- Request Filler/Answer Generator aware of NE
types dates, times, people names, company names,
locations, currency expressions. - April 14th, 1912, 14th of April 1912, 14
April 1912 instances of same date, but different
strings. - For date expressions, normalization performed to
ISO 8601 (YYYY-MM-DD) in Answer Generator. - summer, last year, etc. remain as strings.
19Answer Type Checking
- Motivation
- Errors in earlier modules or ambiguous
information in the document can generate improper
answer candidates. - Not all the answer candidates from IX are the
potential answers. - Validate answer candidates by checking how
adequate each answer is with respect to the
answer type. - Current approaches
- Use WordNet
- Use Gazetteer for location questions
- Use Google for object questions
- Use internal patterns for numeric and date
questions
20SystemArchitecture
Answer Justification
Web Browser
Domain Model
Data Repository
process history data
JAVELIN operator (action) models
Question Analyzer
Planner
JAVELIN GUI
Execution Manager
FST Extractor
Retrieval Strategist
KNN Extractor
Light Extractor
Information Extractors
...
SVM Extractor
Answer Generator
NLP Extractor
21Graphical User Interface (GUI)
22GUI/Planner Interaction
GUI
Planner
- QUESTION XML containing
- question text
- planner settings
- PAUSE
- RESUME
- QUIT (end session)
- STOP (abort question)
- ANSWER XML containing
- answers in rank order
- confidence scores
- repository IDs
- OK
- ERROR description
GUI-Initiated
- DIALOG XML containing
- type of dialog (yes/no, multiple choice, text)
- question to ask user
- default response
- choices to display (when applicable)
- RESPONSE text containing
- yes or no
- text of selected choice
- reply text
Planner-Initiated
23Motivation for Planning
- Enable run-time generation of new
question-answering strategies - Improve ability to recover from bad decisions as
information is collected - Gain insight into when different QA components
are most useful
24JAVELIN Planning Approach
- Reasoning at a level above syntactic and lexical
details of individual requests - QA process steps - planning domain operators
- information consumed/produced by the system -
planning state - Explicit models of state and action uncertainty
- Utility-based forward-chaining planning algorithm
- Choose actions with maximum expected utility of
information - Interleave planning and execution control of
JAVELIN QA components to manage information
uncertainty
25System Architecture
Domain Model
Data Repository
Planner
JAVELIN GUI
Execution Manager
S0
Algorithm runs until goal is satisfied or failure
conditions are met
...
26Role of the Execution Manger
- Coordinates communication between Planner and
other question-answering components - Supports session architecture by storing all
planning steps and processing data in the
Repository - Simplifies integration of new modules
- Provides centralized Repository access
- Authenticates users for GUI
- Runs batch end-to-end pipeline system tests
27Sequence with Interactivity Enabled
Q Where is bile produced? A 1. liver
(0.99175) 2. tube (0.83664) 3. doctors
(0.81202) 4. operation (0.81031) 5. Guangdong
Province (0.78025) 136 additional answers
28Sequence with Interactivity Enabled
In comparison with non-interactive mode
Q Where is bile produced? A 1. China
(0.96944) 2. Moscow (0.75011) 3. Cambridge
(0.75011) 4. Guangdong Province (0.60531) 5.
Chinese (0.49776) 4 additional answers
300 DS5597 300 RO6180 DS5597 300 FS15985 SVM 300 Q17262 RANKED
time 51 sec
and intermediate results produced by the
interactive mode...
FS15952, AL5445 (SVM) 1 drug (0.73359) 2 liver
(0.6497) 3 acid sequestrants (0.49766) 4
LDL-cholesterol (0.47154) 5 rheumatoid arthritis
(0.47154) 12 additional answers
FS15935, FS15957 (FST) No answer
found FS15939, AL5440 (SVM) (same as
non-interactive mode above)
FS15962, AL5446 (Light) 1 Moscow (0.25) 2
Cambridge (0.25) 3 Dallas (0.01282) 4 China
(0.01259)
29Javelin Repository
- The repository stores all the decisions made by
the Planner and information produced by the
modules in a persistent database - Permits a detailed trace of the systems
operation (a move toward answer justification)
30Repository ERD
Request Object
Planner Objects
AnswerObjects
31(No Transcript)
32Adding Shallow Semantics to JAVELIN NLP IX
- Answer extraction module that makes use of
natural language processing capabilities - Currently depends on shallow, broad-coverage
parsing - Similarity-based unification strategy
- Incorporates a general framework for text
processing plug-ins
33Basic Idea
partial interpretation
Unification on simple predicatesrepresenting
basic argumentstructure will provide a
moreaccurate way to match questionswith
appropriate answer(s)
Two Challenges Where do predicates come
from? Flexibility in interpretation
34Comparing LR for Question and Answer Passages
35Text Processor (TP)
- Complex question analysis requires many types of
language processing - Simple Tokenization, POS tagging
- Harder Synonym expansion, syntax
- Hardest Semantic frames, temporal info
- Collect all of these services into a single
module - CLAWS POS tagger
- RASP syntactic parser
- Link grammar parser
- WordNet synsets
- FrameNet semantic frames
- BBN Identifinder NE tagging
36Linguistic Reasoning about Domain Content
- More complex questions require more complex
reasoning. - Joe has access to weapons-grade anthrax.
- Joe is thought to possess warheads capable of
delivering biological agents. - Is Joe capable of mounting a biological attack?
- Requires inference over information drawn from
multiple documents
Planner reasoning about the QA ProcessFLOOD
reasoning about domain content
37FLOOD Reasoner
- FLOOD is an environment for developing reasoners
for complex question analysis - Consumes semantic frame information from the text
processor - Provides a planning platform for rule
specification - Allows complex operations such as subqueries, etc.
38Pronoun Resolution
- Same sentence
- This article also states that intelligence
sources world-wide have been on a "manhunt" the
last several weeks for bin Ladin due to reports
that he had purchased nuclear weapons. - Previous sentence
- There are still constraints on Saddams power.
His economic infrastructure is in long-term
decline, and his ability to project power outside
Iraqs borders is severely limited, largely
because of the effectiveness and enforcement of
the No-Fly Zones. - Intervening discourse
- Iraq is forging ahead with its outlawed
chemical, nuclear and germ weapon programs as
well as with the development of missiles to
deliver them, Defense Secretary Donald Rumsfeld
said on Friday. Saddam Hussein 's appetites for
these weapons is enormous, he said in an
interview with the Fox News Channel.
39General Process
- Parse the retrieved text
- morphology (POS, stem)
- lexical information (NE tagger, WordNet)
- syntactic structure (RASP)
- grammatical functions (Link)
- Assign agreement features gender, person,
number, animacy - Select possible antecedents (NPs agreeing with
the pronoun) - Prune candidates according to
- Known linguistic principles, where applicable
- Heuristics (from Mitamura et al., 2003)
40Multilingual JAVELINArchitecture
?s
Ongoing/Future Work
Chinese Request Object (UTF-8)
Chinese Answers (UTF-8)
Chinese IX module collection
Answer Generator
Question Analyzer
Japanese Answers (UTF-8)
Japanese Request Object (UTF-8)
Multilingual Retrieval Strategist (UTF-8)
Japanese IX module collection
Answer Generator
Multilingual Question Object (UTF-8)
Encoding Conversion
Answer Generator
English IX module collection
English Request Object (UTF-8)
English Answer (UTF-8)
English Answers (UTF-8)
Chinese Corpora (GBK)
English Corpora (ASCII)
Japanese Corpora (EUC-JP)
41Japanese Language Resources
- Mainichi Shimbun Corpus
- Full corpus of a major Japanese newspaper for
1998 and 1999 (About 240,000 articles) - Bilingual Dictionaries
- EDICT (100,000 general entries, 200,000 Japanese
personal names, 87,000 Japanese place names,
14,000 scientific terms) - EIJIRO (English word to Japanese phrase harder
to use, but has 1,080,000 entries) - Web-based Machine Translation
- Systran
- Amikai
- Named Entity Tagger / Dependency Structure
Analyzer - Cabocha
- POS-tagger
- Chasen
42Chinese Language Resources
- Corpora
- Xinhua News corpora (in use)
- Xinhua News from 1991-2001
- NTCIR-3 CLIR IR/CLIR Test Collection (future)
- Chinese news articles publish in Taiwan in
1998-1999 - Foreign Broadcast Information Service (future)
- Mandarin-English parallel corpora
- Preprocessing (tools from RADD-MT project)
- ASCII character and digit normalization
- Segmentation
- Name entity tagging
- Bilingual Dictionaries
- LDC
- Bilingual word-to-word dictionary
- Bilingual phrase-to-phrase dictionary
- CEDICT
- Chinese-English dictionary
43Testing and Evaluation
- Daily test framework reporting
- Evaluations
- TREC 11 QA Track evaluation
- Relationship Pilot evaluation
- TREC 12 QA Track evaluation
Details available from NIST web site or the
JAVELIN home pagehttp//www.lti.cs.cmu.edu/Rese
arch/JAVELIN
44Evaluation Techniques
- Execution Manager can run in lights out batch
mode - Regular tests on different test suites (TREC
question suites, relationship pilot questions,
etc.) - Results include scores and logs for debugging
- All intermediate results are stored in Repository
45Sample Results
46Sample Log File Excerpt
47Remaining Challenges
- Getting adequate training data for statistical
approaches - Getting adequate lexico-semantic resources for
NLP approaches - Combining existing NLP tools into an integrated
framework - Extending the data model and representations for
scenario-based QA
48Future Work
- Variable-Precision Knowledge Representation and
Reasoning - Scenario-Driven Dialogs
- Scenario Representation
- Multilingual, Distributed IR
- Multi-Strategy Information Gathering
- Answer Visualization and Scenario Refinement
49Questions?