Title: START: Natural Language Access to Information
1START Natural Language Accessto Information
Boris Katz, Gary Borchardt, Sue Felshin, Jimmy
Lin, Jerome McFarland, Ali Ibrahim, Luciano
Castagnola, Baris Temelkuran, Aaron Fernandes,
Alp Simsek, Jonathan Wolfe, Matthew Bilotti
MIT Artificial Intelligence Lab http//www.ai
.mit.edu/projects/infolab/
2I had a dream...
?
Library of Congress
3- Reality
- What we can do
- Understand ordinary sentences and questions
- What we cant do (yet)
- 1. Full-text NL understanding still beyond reach
- Common sense implication
- Intersentential reference
- Summarization
- 2. Not all information is languagemost Web
resources are not textual
- Maps and Images
- Sound and Video
- Multimedia
- Web resources are distributed across numerous
non-traditional databases
4Bridging the Gap
Library of Congress
5- The Solution Natural Language Annotations
- Annotations bridge the gap between our ability to
analyze natural
- language sentences and our desire to access the
huge amount of data available in our libraries
and on the Web.
- Annotations are collections of natural language
sentences and phrases that describe the content
of various information segments.
- START
- analyzes these annotations
- creates the necessary representational
structures
- produces special pointers to the information
segments summarized by the annotations
6Natural Language Annotations
START knowledge base
... one Mars year lasts 687 Earth days.
Annotation
Marss year is long.
Annotator
Questions
- How long is the Martian year?
- How long is a year on Mars?
- How many days are in a Martian
- year?
-
User
... one Mars year lasts 687 Earth days.
7Parsing
A chain of reactions converts each molecule of
glucose into two smaller molecules of pyruvate.
S
NP
VP
PP
det
NP
prep
NP
converts
noun
PP
a
quantity
each
NP
prep
into
chain
noun
PP
of
noun
two
smaller
molecule
of glucose
molecules of pyruvate
reactions
8Ternary expressions (T-expressions)
A chain of reactions converts each molecule of
glucose into two smaller molecules of pyruvate.
uantifier each int
o molecules-5
9- T-expression Representation
- List of node-link-node triples
- Nouns, adjectives are nodes
- Links cover
- relationships between verbs and their arguments
- fundamental semantic relationships is-a (for
equality, membership, and subclass
relationships), related-to (for possessives,
etc.) - modification of nouns quantifier, quantity,
is (for adjectives)
- prepositions
10S-rules for Structural Variation
The president impressed the country with his
determination.
The presidents determination impressed the
country.
S-rule for the Property Factoring alternation
someone1 emotional-reaction-verb someone2 with
something
someone1s something emotional-reaction-verb
someone2
related-to
with
related-to
someone1
emotional- reaction- verb
something1
someone1
emotional- reaction- verb
something
Emotional reaction verbs surprise stun amaze st
artle
impress please embarrass annoy etc.
something1
someone2
someone1
someone2
11Sample Assertion
A chain of reactions converts each molecule of
glucose into two smaller molecules of pyruvate.
uantifier each int
o molecules-5
12Sample Query
How are the glucose molecules converted into
pyruvate molecules?
into molecules-5
13Matching
T-expressions from Query
T-expressions from Assertion
Matcher
- Key
- Input Processing
- Query Processing
14A. Reply by Generating
Displayed Answer
Generator
Query How are the glucose molecules converted
into pyruvate molecules?
Answer
A chain of reactions converts each molecule of
glucose into two smaller molecules of pyruvate.
15Reply by Generating Example
16B. Reply from annotation
Displayed Answer
Find resource
Query Show me a picture of Cog.
17Reply from annotation Example
18C. Reply from annotation with script
Displayed Answer
Find resource
Run script
Query Who directed Gone with the Wind?
- Script
- get http//us.imdb.com/Details?0031381
- match regexp...
Gone with the Wind (1939) was directed by George
Cukor, Victor Fleming, and Sam Wood.
Source The Internet Movie Database
IMDb
19Reply from annotation with script Example
20(No Transcript)
21Uniform Access
IMDb
NL questions
Queries
U.S. Census
START
Omnibase
Webster
Data
Multimedia responses
POTUS
NASA
- Local knowledge base of ternary expressions
- Core vocabulary
- Uniform interface to multiple database
formats (Web, text, etc.)
- Integration time independent of size of
database
- Extended lexicon
22How START works
Web browser
START
HTML
Omnibase (external knowledge)
English
Parser
English
Scripts
Scripts
Input T-exps
Generator
Potus
IMDb
Matcher
Annotations
U.S. Census
World Factbook
T-exps from KB
Database of T-exps
Native knowledge
WWW
23Multi-Modal Interaction
Q. "I'd like to speak to Trevor."
Q. "Is Trevor in his office?"
A. "Trevor is in his office but he is on the
phone." A. "Trevor is in his office but he is t
alking to Boris now." A. "Trevor is in his offi
ce however, he doesn't want to be disturbed
until 2pm."