Title: IKRAFT: Interactive Knowledge Representation and Acquisition from Text
1IKRAFTInteractive Knowledge Representation and
Acquisition from Text
- Yolanda Gil
- Varun Ratnakar
- www.isi.edu/expect/projects/trellis
- trellis.semanticweb.org
- USC/Information Sciences Institute
- gil_at_isi.edu
2MotivationHow KBs Are Built Today
Domain Expert
Read/ask /study/listen...
Knowledge Engineer
analyze/group/index...
Knowledge Acquisition Tools
structure/relate/fit...
KB
reason/deduce/solve
3MotivationThe Aftermath of Knowledge Base
Development
Domain Expert
Knowledge Engineer
TRASH
Knowledge Acquisition Tools
KB
reason/deduce/solve
4MotivationCapturing the Design of Knowledge
Bases
Richer representations More ambiguous More
versatile
Introductory texts, expert hints, explanations,
dialogues, comments, examples, exceptions,...
Info. extraction templates, dialogue segments and
pegs, filled-out forms, high-level
connections,...
Knowledge Base
Descriptions augmented with prototypical
examples exceptions, problem-solving steps
and substeps, ...
More formal More concrete More introspectible
Alternative formalizations (KIF, MELD, RDF,),
alternative views of the same notion (e.g.,
what is a threat)
((( )) ())))
(defconcept bridge ()))
5Claims
- Knowledge can be reused at any level of
(in)formality - Knowledge can be extended more easily
- Addtl documents and semi-formal structures
readily available - Knowledge can be translated and integrated at any
level to facilitate interoperability - KR languages can be a straitjacket for some kinds
of knowledge - Intelligent systems will provide better
justifications - Many users want to know where axioms came from
before they trust systems reasoning - Content providers will not need to be
sophisticated programmers/knowledge engineers - May be easier for end users to organize knowledge
rather than formalize it - Good symbiosis of sophisticated and
unsophisticated users
6An ExampleBuilding a Knowledge Base from a
Textbook(DARPA Rapid Knowledge Formation -- RKF)
- The first step a cell takes in reading out part
of its genetic instructions is to copy the
required portion of the nucleotide sequence of
DNA the gene into a nucleotide sequence of
RNA. The process is called transcription because
the information, though copied into another
chemical form, is still written in essentially
the same language the language of nucleotides.
Like DNA, RNA is a linear polymer made of four
different types of nucleotides subunits linked
together by phosphodiester bonds. It differs from
DNA chemically in two respects (1) the
nucleotides in RNA are ribonucleotides that is,
they contain the sugar ribose (hence the name
ribonucleic acid) rather than deoxyribose (2)
although, like DNA, RNA contains the bases
adenine (A), guanine (G), and cytosine (C), it
contains uracil (U) instead of the thymine (T) in
DNA. Since U, like T, can base-pair by
hydrogen-bonding with A, the base-pairing
properties described for DNA also apply to RNA - --
Essential Cell Biology, Alberts et al. 1992
7Protein Synthesis in RKFs SHAKEN Authored by a
Biologist Chaudri et al 2001
8Step 1 Selecting Relevant Knowledge Fragments
- The first step a cell takes in reading out part
of its genetic instructions is to copy the
required portion of the nucleotide sequence of
DNA the gene into a nucleotide sequence of
RNA. The process is called transcription because
the information, though copied into another
chemical form, is still written in essentially
the same language the language of nucleotides.
Like DNA, RNA is a linear polymer made of four
different types of nucleotides subunits linked
together by phosphodiester bonds. It differs from
DNA chemically in two respects (1) the
nucleotides in RNA are ribonucleotides that is,
they contain the sugar ribose (hence the name
ribonucleic acid) rather than deoxyribose (2)
although, like DNA, RNA contains the bases
adenine (A), guanine (G), and cytosine (C), it
contains uracil (U) instead of the thymine (T) in
DNA. Since U, like T, can base-pair by
hydrogen-bonding with A, the base-pairing
properties described for DNA also apply to RNA - --
Essential Cell Biology, Alberts et al. 1992
9Step 2Composing Stylized Knowledge Fragments
- - ribose
- - it is a kind of sugar, like deoxyribose
- - it is contained in the nucleotides of
RNA - - uracil
- - it is a kind of nucleotide, like adenine
and guanine - - it can base-pair with adenine
- - RNA
- - it is a kind of nucleic acid, like DNA
- - it contains uracil instead of thymine
- - it is single-stranded
- - it folds in complex 3-D shapes
- - nucleotides are linked with
phospohodiester bonds, like DNA - - there are many types of RNA
- - RNA is the template for synthesizing
protein - - its nucleotides contain the sugar ribose
(DNA has deoxyribose) - - gene
- - subsequence of DNA that can be used as a
template to create protein - - protein synthesis
- - non-destructive creation process RNA and
protein created from DNA
10Step 3Creating Knowledge Base Items
-
- (defconcept uracil is-primitive nucleotide
- constraints (the base-pair adenine))
- (defconcept RNA
- is (and nucleic-acid
- (some contains uracil)))
-
11IKRAFT Interactive Knowledge Representation and
Acquisition from Text
- User starts with documents, extracts a small
amount of information from them - Text contains significant portions for
context/reference/recall - IKRAFT allows users to annotate text with
statements, expressed in natural language - Highlight portions of original text, annotate
statement - Statements tend to be stylized
- Statements are parsed, system generates summary
of - Objects
- Events/actions
12IKRAFT Annotating Manual Information Extraction
13IKRAFT Extracting Statements from
Complementary/Contradictory Text Sources
14IKRAFT Documenting Seismic Hazard in Southern
California
15Seismic Hazard Analysis (SHA) for Southern
California Earthquake Center (SCEC)
16DOCKER Scientist Publishes SHA Models
- User specifies
- Types of model parameters
- Format of input messages
- Documentation
- Constraints
Web Browser
AS97
DOCKER
Model Specification
User Interface
AS97
docs
types
msg
constrs
Wrapper Generation (WSDL, PWL)
Constraint Acquisition
AS97 ontology
SCEC ontologies
17Documenting the Model with IKRAFT
18Documenting Each Constraint
19Formalizing Simple Constraints
20Documentation of Constraints (Some Are
Formalized, Some Are Not)
21DOCKER Engineer Uses SHA Model
- User can
- Browse through SHA models
- Invoke SHA models
- Get help in selecting appropriate model
AS97
Web Browser
DOCKER
AS97
docs
constrs
Model Reasoning
User Interface
msg
types
AS97 ontology
Pathway Elicitation
Constraint Reasoning
Shared ontologies
KRR (Powerloom)
22DOCKER Detects Constraint Violations
23Should Engineer Override Constraint Specified by
Model Developer?
24Engineer Brings Up IKRAFT to Find Reasons for the
Constraint
25Engineer Can Check Additional Model Constraints
(Not Formalized)
26Constraints Grounded on Model Documentation
27Engineers Makes an Informed Decision on Whether
to Override the Constraint
28Discussion
- Overhead in capturing the rationale?
- Related to motivation and payoff
- Rationale here is captured in a very simple
process - Related Work
- Documenting design rationale Shum 96
- Methodologies for knowledge base development
Schreiber et al 00 - Higher-level languages, e.g., KARL Fensel et al
98
29Conclusions and Future Work
- IKRAFT helps users document formal expressions
- Each formal expression is back up by a concise NL
statement that is linked back to one or more
sources - Users can understand justification for systems
reasoning (e.g., SHA) - Future work
- NLP techniques to extract terms from users
concise statements - Controlled grammar for formulation of statements
- Other documentation e.g., tables, forms,
exceptions - High payoff in capturing the rationale of
knowledge bases
30Speculation Will the (Semantic) Web End Up
Looking Like This?
Richer representations More ambiguous More
versatile
Introductory texts, expert hints, explanations,
dialogues, comments, examples, exceptions,...
Info. extraction templates, dialogue segments and
pegs, filled-out forms, high-level
connections,...
Descriptions augmented with prototypical
examples exceptions, problem-solving steps
and substeps, ...
More formal More concrete More introspectible
Alternative formalizations (KIF, MELD, RDF,),
alternative views of the same notion (e.g.,
what is a threat)
((( )) ())))
(defconcept bridge ()))