Title: From Question-Answering to Information-Seeking Dialogs
1From Question-Answeringto Information-Seeking
Dialogs
- Jerry R. Hobbs
- USC Information Sciences Institute
- Marina del Rey, California
(with Chris Culy, Douglas Appelt, David Israel,
Peter Jarvis, David Martin, Mark Stickel, and
Richard Waldinger of SRI)
2Key Ideas
1. Logical analysis/decomposition of
questions into component questions, using a
reasoning engine 2. Bottoming out in variety of
web resources and information extraction
engine 3. Use of component questions to drive
subsequent dialogue, for elaboration,
revision, and clarification 4. Use of analysis
of questions to determine, formulate, and
present answers.
3Plan of Attack
Inference-Based System Inference for
Question-Answering -- this year Inference
for Dialog Structure -- beginning
now Incorporate Resources
Geographical Reasoning -- this year Temporal
Reasoning -- this summer Agent and action
ontology -- this summer Document retrieval
and information extraction for
question-answering -- beginning now
4An Information-Seeking Scenario
How safe is the Mascat harbor for refueling US
Navy ships?
Question Decomposition via Logical Rules
Are relations between Oman and US friendly?
What recent terrorist incidents in Oman?
Asking User is one such Resource
How secure is the Mascat harbor?
Resources Attached to Reasoning Process
Ask Analyst
IR IE Engine for searching recent news feeds
Find map of harbor from DAML-encoded Semantic
Web/Intelink
5Composition of Informationfrom Multiple Sources
How far is it from Mascat to Kandahar?
GEMINI SNARK
Question Decomposition via Logical Rules
What is the distance between the two lat/longs?
What is the lat/long of Mascat?
What is the lat/long of Kandahar?
Resources Attached to Reasoning Process
Alexandrian Digital Library Gazetteer
Alexandrian Digital Library Gazetteer
Geographical Formula or www.nau.edu/cvm/latl
ongdist.html
6Composition of Informationfrom Multiple Sources
Show me the region 100 km north of the capital of
Afghanistan.
Question Decomposition via Logical Rules
What is the capital of Afghanistan?
What is the lat/long 100 km north?
Show that lat/long
What is the lat/long of Kabul?
Terravision
CIA Fact Book
Alexandrian Digital Library Gazetteer
Geographical Formula
Resources Attached to Reasoning Process
7Combining Time, Space,and Personal Information
Could Mohammed Atta have met with an Iraqi
official between 1998 and 2001?
Logical Form
Question Decomposition via Logical Rules
meet(a,b,t) 1998 ? t ??2001
at(a,x1,t) at(b,x2,t)
near(x1,x2)
official(b,Iraq)
go(a,x1,t)
go(b,x2,t)
Temporal Reasoning
IE Engine
IE Engine
Geographical Reasoning
Resource Attached to Reasoning Process
8System Architecture
Query
parsing
Proof with Answer
GEMINI
Logical Form
SNARK
decomposition and interpretation
Other Resources
Web Resources
9Two Central Systems
GEMINI Large unification grammar of English
Under development for more than a decade
Fast parser Generates logical forms
Used in ATIS and CommandTalk SNARK Large,
efficient theorem prover Under development
for more than a decade Built-in temporal and
spatial reasoners Procedural attachment,
incl for web resources Extracts answers from
proofs Strategic controls for speed-up
10Linguistic Variation
How far is Mascat from Kandahar? How far is it
from Mascat to Kandahar? How far is it from
Kandahar to Mascat? How far is it betweeen Mascat
and Kandahar? What is the distance from Mascat to
Kandahar? What is the distance between Mascat and
Kandahar?
GEMINI parses and produces logical forms for
most TREC-type queries Use TACITUS and
FASTUS lexicons to augment GEMINI
lexicon Unknown word guessing based on
"morphology" and immediate context
11"Snarkification"
Problem GEMINI produces logical forms not
completely aligned with what SNARK theories
need Current solution Write simplification
code to map from one to the
other Long-term solution Logical forms that
are aligned better
12Relating Lexical Predicatesto Core Theory
Predicates
"... distance ..."
"how far ..."
distance-between
Need to write these axioms for every domain
we deal with Have illustrative examples
13Decomposition of Questions
lat-long(l1,x) lat-long(l2,y)
lat-long-distance(d,l1,l2) --gt
distance-between(d,x,y)
Need axioms relating core theory predicates
and predicates from available resources Have
illustrative examples
14Procedural Attachment
Declaration for certain predicates There is
a procedure for proving it Which arguments
are required before called lat-long(l1,x)
lat-long-distance(d,l1,l2) When predicate
with those arguments bound is generated
in proof, procedure is exectuted.
15Open Agent Architecture
OAA Agent
SNARK
GEMINI
snarkify
Resources via OAA Agents
16Use of SMART TextPro
Question
Question Decomposition via Logical Rules
Subquestion-1
Subquestion-3
One Resource Among Many
Subquestion-2
Resources Attached to Reasoning Process
SMART TextPro
Other Resources
17Information ExtractionEngine as a Resource
Document retrieval for pre-processing TextPro
Top of the line information extraction engine
recognizes subject-verb-object,
coref rels Analyze NL query w GEMINI and
SNARK Bottom out in a pattern for TextPro to
seek Keyword search on very large corpus
TextPro runs over documents retrieved
18Linking SNARK with TextPro
Type of questioned constituent
Synonyms and hypernyms of word associated with p
or c
Call to TextPro
TextSearch(EntType(?x), Terms(p), Terms(c),
WSeq) Analyze(WSeq, p(?x,c))
--gt p(?x,c)
Answer Ordered sequence of annotated strings of
words
Match pieces of annotated answer strings with
pieces of query
Subquery generated by SNARK during analysis of
query
19Three Modes of Operationfor TextPro
- Search for predefined patterns and relations
(ACE-style) and translate relations into SNARK's
logic - Where does the CEO of IBM live?
- Search for subject-verb-object relations in
- processed text that matches
predicate-argument - structure of SNARK's logical expression
- "Samuel Palmisano is CEO of IBM."
-
- Search for passage with highest density of
- relevant words and entity of right type for
answer - "Samuel Palmisano .... CEO .... IBM."
- Use coreference links to get most informative
answer
ACE Role and AT Relations
20First Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq)
Analyze(WSeq, Role(?x,Management,IBM,C
EO)) --gt CEO(?x,IBM)
Entity1 Samuel Palmisano, Palmisano, head,
he Entity2 IBM, International Business
Machines, they Relation Role(Entity1,Entity2,
Management,CEO)
ltrelation TYPERole SUBTYPEManagementgt
ltrel_entity_arg IDEntity1 ARGNUM1/gt
ltrel_entity_arg IDEntity2 ARGNUM2/gt
ltrel_attribute ATTRPOSITIONgtCEOlt/rel_attributegt
lt/relationgt
Analyze
CEO(Samuel Palmisano,IBM)
21Three Modes of Operationfor TextPro
- Search for predefined patterns (MUC-style) and
- translate template into SNARK's logic
- Where does the CEO of IBM live?
- Search for subject-verb-object relations in
- processed text that matches
predicate-argument - structure of SNARK's logical expression
- "Samuel Palmisano is CEO of IBM."
-
- Search for passage with highest density of
- relevant words and entity of right type for
answer - "Samuel Palmisano .... CEO .... IBM."
- Use coreference links to get most informative
answer
22Second Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq)
Analyze(WSeq, CEO(?x,IBM)) --gt
CEO(?x,IBM)
"ltsubjgt Samuel Palmisano lt/subjgt ltverbgt heads
lt/verbgt ltobjgt
IBM lt/objgt"
Analyze
CEO(Samuel Palmisano,IBM)
23Three Modes of Operationfor TextPro
- Search for predefined patterns (MUC-style) and
- translate template into SNARK's logic
- Where does the CEO of IBM live?
- Search for subject-verb-object relations in
- processed text that matches
predicate-argument - structure of SNARK's logical expression
- "Samuel Palmisano is CEO of IBM."
-
- Search for passage with highest density of
- relevant words and entity of right type for
answer - "Samuel Palmisano .... CEO .... IBM."
- Use coreference links to get most informative
answer
24Third Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq)
Analyze(WSeq, CEO(?x,IBM)) --gt
CEO(?x,IBM)
"ltpersongt Samuel Palmisano lt/persongt ...."
coref
"ltpersongt He lt/persongt has recently been rumored
to have been appointed Lou Gerstner's successor
as ltCEOwordgt CEO lt/CEOwordgt of the major computer
maker nicknamed ltcogt Big Blue lt/cogt"
Analyze
CEO(Samuel Palmisano,IBM)
25Domain-Specific Patterns
Decide upon domain (e.g., nonproliferation) Compi
le list of principal properties and relations
of interest Implement these patterns in
TextPro Implement link between TextPro and
SNARK, converting between templates and logic
26Challenges
Cross-document identification of individuals
Document 1 Osama bin Laden Document 2
bin Laden Document 3 Usama bin Laden Do
entities with the same or similar names
represent the same individual? Metonymy
Text Beijing approved the UN resolution on
Iraq. Query involves China, not Beijing
27DAML Search Engine
Tecknowledge has developed
pred
capital
namespace
Searches entire (soon to be exponentially
growing) Semantic Web
arg1
?x
namespace
arg2
Indonesia
namespace
Also conjunctive queries population of capital
of Indonesia
Problem you have to know logic and RDF to use it.
28DAML Search Engineas AQUAINT Web Resource
AQUAINT System
capital(?x,Indonesia)
procedural attachment in SNARK
pred
capital
namespace
Searches entire (soon to be exponentially
growing) Semantic Web
arg1
?x
namespace
arg2
Indonesia
namespace
Solution You only have to know English to use
it Makes the entire Semantic Web accessible
to AQUAINT users.
29Temporal Reasoning Structure
Topology of Time start, end, before,
between Measures of Duration for an hour,
... Clock and Calendar 345pm, Wednesday, June
12 Temporal Aggregates every other
Wednesday Deictic Time last year, ...
30Temporal Reasoning Goals
Develop temporal ontology (DAML) Reason
about time in SNARK (AQUAINT, DAML) Link
with Temporal Annotation Language TimeML
(AQUAINT) Answer questions with temporal
component (AQUAINT)
Nearly complete
In progress
31Convergence
DAML Annotation of Temporal Information on
Web (DAML-Time)
Annotation of Temporal Information in
Text (TimeML)
Most information on Web is in text
The two annotation schemes should be
intertranslatable
32TimeML Annotation Scheme(An Abstract View)
clock calendar intervals instants
2001
inclusion
intervals
6 mos
Sept 11
durations
before
warning
instantaneous events
33TimeML Example
The top commander of a Cambodian resistance force
said Thursday he has sent a team to recover the
remains of a British mine removal expert
kidnapped and presumed killed by Khmer Rouge
guerrillas two years ago.
resist
command
recover
sent
Thursday
said
now
2 years
presumed
remove
kidnap
killed
remain
34Vision
Manual DAML temporal annotation of web resources
Manual temporal annotation of large NL corpus
Programs for automatic temporal annotation of
NL text Automatic DAML temporal annotation of
web resources
35Spatial and GeographicalReasoning Structure
Topology of Space Is Albania a part of
Europe? Dimensionality Measures How large is
North Korea? Orientation and Shape What
direction is Monterey from SF? Latitude and
Longitude Alexandrian Digital Library
Gazetteer Political Divisions CIA World Fact
Book, ...
36Spatial and GeographicalReasoning Goals
Develop spatial and geographical ontology
(DAML) Reason about space and geography in
SNARK (AQUAINT, DAML) Attach spatial and
geographical resources (AQUAINT) Answer
questions with spatial component (AQUAINT)
Some capability now
37Rudimentary Ontologyof Agents and Actions
Persons and their properties and relations
name, alias, (principal) residence family
and friendship relationships movements and
interactions Actions/events types of
actions/events preconditions and effects
38Domain-DependentOntologies
Nonproliferation data and task
Construct relevant ontologies
39Dialog ModelingApproaching It Top Down
Key Idea System matches user's utterance
with one of several active tasks.
Understanding dialog is one active task.
Rules of form property(situation) --gt
active(Task1) including utter(u,w) --gt
active(DialogTask) want(u,Task1) --gt
active(Task1)
Understanding is matching utterance (conjunction
of predications) with an active task or the
condition of an inactive task.
40Dialog Task Model
Action determined by utterance and task
yes
understand(a,e,t) hear(a,w)
parse(w,e) match(e,t)
no -- x unmatched
Ask about x
41Dialog ModelingApproaching It Bottom Up
identifyx p(x) gt identifyx p(x)
q(x) Clarification Show me St
Petersburg. Florida or Russia? Refinement
Show me a lake in Israel. Bigger than 100 sq
mi. identifyx p(x) gt identifyx
p1(x), where p and p1 are related Further
properties What's the area of the Dead Sea?
The depth? Change of parameter Show me a
lake in Israel. Jordan. Correction Show
me Bryant, Texas. Bryan. identifyy yf(x)
gt identifyz zg(y) Piping What is
the capital of Oman? What's its
population? Challenge Narrowing in on
information need.
42Fixed-Domain QA EvaluationWhy?
Who is Colin Powell? What is naproxen? Broad
range of domains gt shallow processing Relativel
y small fixed domain gt possibility of
deeper processing
43Fixed-Domain QA Evaluation
Pick a domain, e.g., nonproliferation Pick a
set of resources, including a corpus of
texts, structured databases, web services Pick
3-4 pages of Text in domain (to constrain
knowledge) Have expert make up 200 realistic
questions, answerable with Text non-NL
resources inference (maybe explicit NL
resources) Divide questions into training and
test sets Give sites one month to work on
training set Test on test set and analyze results
44Some Issues
Range of questions from easy to impossible Form
of questions question templates? let
data determine -- maybe 90 manually
produced logical forms? Form of answers
natural language or XML
templates? Isolated questions or sequences
related to fixed scenario? Some of
each Community interest Half a dozen sites
might participate if difficulties
worked out
45Next Steps
Pick several candidate Texts Researchers and
experts generate questions from those Texts