Towards Bootstrapping KnowledgeBased Archives - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Towards Bootstrapping KnowledgeBased Archives

Description:

Towards Self-Validating Knowledge-Based Archives, Bertram Lud scher, Richard ... Knowledge-Based Persistent Archives, Reagan Moore, SDSC TR-2001-7, January 18, 2001 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 20
Provided by: frank145
Learn more at: https://users.sdsc.edu
Category:

less

Transcript and Presenter's Notes

Title: Towards Bootstrapping KnowledgeBased Archives


1
Towards Bootstrapping Knowledge-Based Archives
  • Bertram Ludäscher
  • Richard Marciano
  • Reagan Moore
  • San Diego Supercomputer Center
  • ludaesch,marciano,moore_at_sdsc.edu

Towards Self-Validating Knowledge-Based
Archives, Bertram Ludäscher, Richard Marciano,
Reagan Moore, 11th Workshop on Research Issues in
Data Engineering (RIDE), Heidelberg, IEEE
Computer Society, April 2001
2
Archival Processes and Functions
  • Data submission/accessioning
  • loop information producer ltgt "archival
    engineer"
  • Ingestion
  • a sequence of information preserving
    transformations is applied to submitted "raw
    data" gt ingestion network
  • Migration
  • ... as time goes by ...
  • ... migrate to new physical media, maybe data
    formats, information model ...
  • "easy migration" ltgt "good" archival format
    model
  • Instantiation/Access
  • revive/reanimate the archive gt queryable
    collection/database
  • GOAL preserve information!
  • (oops, records, that is)

3
What is it that we try to archive??
  • What constitutes a record?
  • -- beats me...
  • but there are hierarchies of information /
    abstractions
  • data ... information ... knowledge ...
    (aka the big picture!)
  • instance ... schema ... model ... metamodel ...
    metametamodel ...
  • linear syntax ... data structure ... data model
    ... conceptual model ...
  • Static vs. dynamic information
  • extensional data ... intensional/virtual/derived
    data (facts/rules)
  • data ... functions/programs
  • Managing complexity via layered approaches
  • "protocol stack" (cf. ISO/OSI, "SemanticWeb
    communication in general)
  • going upgt index/correlate/aggregate/abstract

4
Senate Collection Example
  • What you see
  • is maybe NOT what you get (a not so well
    documented format)

5
Senate Collection Example
  • Rich Text Format (a documented Microsoft
    format)

\pard\parM \pard\b S. 345\b0\parM
\pard\qr DATE INTRODUCED 02/03/1999\parM
\pard SPONSOR Allard\parM \i\qc OFFICIAL
TITLE\i0\parM \pard A bill to amend the Animal
Welfare Act to remove the limitation that permits
\ interstate movement of live birds, for the
purpose of fighting, to States in which \ animal
fighting is lawful.\parM \i\qc LATEST
STATUS\i0\par\pardM \pard\plain
\fi-1900\li1900\nowidctlpar\adjustrightFeb 3,
1999\tab Read twice and\ referred to the
Committee on Agriculture.\parM \pardM
  • can be wrapped into XML

ltp bold"off"gt S. 345lt/pgt ltp align"right"
bold"off"gtDATE INTRODUCED 02/03/1999lt/pgt ltp
bold"off"gtSPONSOR Allardlt/pgt ltp align"center"
bold"off" italic"off"gtOFFICIAL TITLElt/pgt ltp
bold"off" italic"off"gtA bill to amend the
Animal Welfare Act to remove the lim\ itation
that permits interstate movement of live birds,
for the purpose of fighting\ , to States in which
animal fighting is lawful.lt/pgt ltp align"center"
bold"off" italic"off"gtLATEST STATUSlt/pgt
ltpgtltstringgtFeb 3, 1999tabRead twice and
referred to the Committee on Agriculture\ .lt/strin
ggtlt/pgt ltpgtlt/pgt
6
Senate Collection Example
  • the XML can be lifted from the presentation
    level

ltp bold"off"gt S. 345lt/pgt ltp align"right"
bold"off"gtDATE INTRODUCED 02/03/1999lt/pgt ltp
bold"off"gtSPONSOR Allardlt/pgt ltp align"center"
bold"off" italic"off"gtOFFICIAL TITLElt/pgt ltp
bold"off" italic"off"gtA bill to amend the
Animal Welfare Act to remove the lim\ itation
that permits interstate movement of live birds,
for the purpose of fighting\ , to States in which
animal fighting is lawful.lt/pgt ltp align"center"
bold"off" italic"off"gtLATEST STATUSlt/pgt
ltpgtltstringgtFeb 3, 1999tabRead twice and
referred to the Committee on Agriculture\ .lt/strin
ggtlt/pgt ltpgtlt/pgt
  • to the information level

ltbill name"S.345"gt ltcommitteesgt
ltcommitteegtSENATE AGRICULTURElt/committeegt
lt/committeesgt ltdate_introducedgt02/03/1999lt/da
te_introducedgt ltlatest_status_listgt
ltlatest_statusgt ltls_dategtFeb 3,
1999lt/ls_dategt
ltls_txtgtRead twice and referred to the
Committee on Agriculturelt/ls_txtgt
lt/latest_statusgt lt/latest_status_listgt
ltofficial_titlegtA bill to amend the Animal
Welfare Act to remove the limitation that permits
interstate movement of live birds, for the
purpose of fighting, to States in which animal
fighting is lawful.lt/official_titlegt
ltsponsorgtAllard, Wayne COlt/sponsorgt lt/billgt

7
Senate Collection Example
  • and consolidated across the whole collection

ltcosponsorgt ltco_name
a-date"A-03/29/2000"gtGraham, Bob FLlt/co_namegt
lt/cosponsorgt lt/cosponsorsgt
ltdate_introducedgt02/03/1999lt/date_introducedgt
ltlatest_status_listgt ltlatest_statusgt
ltls_dategtMar 2, 2000lt/ls_dategt
ltls_txtgtCommittee on Energy and Natural
Resources. Ordered to be reported with\ out
amendment favorably.lt/ls_txtgt
lt/latest_statusgt ltlatest_statusgt
ltls_dategtFeb 3, 1999lt/ls_dategt
ltls_txtgtRead twice and referred to the Committee
on Agriculturelt/ls_txtgt lt/latest_statusgt
lt/latest_status_listgt ltofficial_titlegtA
bill to amend the Animal Welfare Act to remove
the limitation that permits interstate movement
of live birds, for the purpose of fighting, to
States in which animal fighting is
lawful.lt/official_titlegt ltsponsorgtAllard,
Wayne COlt/sponsorgt lt/billgt
ltbill name"S.345"gt ltabstractgtNONElt/abstractgt
ltcommitteesgt ltcommitteegtSENATE
AGRICULTURElt/committeegt lt/committeesgt
ltcosponsorsgt ltcosponsorgt
ltco_name a-date"A-02/22/1999"gtSmith, Bob
NHlt/co_namegt lt/cosponsorgt
ltcosponsorgt ltco_name
a-date"A-02/22/1999"gtHarkin, Tom IAlt/co_namegt
lt/cosponsorgt ltcosponsorgt
ltco_name a-date"A-02/22/1999"gtSantorum,
Rick PAlt/co_namegt lt/cosponsorgt
ltcosponsorgt . . .
8
Senate Collection Example
  • Information level schema as an XML DTD

lt?xml version"1.0" encoding"UTF-8"?gt lt!ELEMENT
bills (bill)gt lt!ELEMENT bill ( abstract?,
committees?, congressional_record?, cosponsors?,
date_introduced?, digest?,
latest_status_list?, official_title?, sponsor?,
statement_of_purpose?, submitted_by?,
submitted_for?)gt lt!ATTLIST bill_name CDATA
REQUIREDgt lt!ELEMENT committees
(committee)gt lt!ELEMENT cosponsors
(cosponsor)gt lt!ELEMENT digest (PCDATA)gt lt!ELEMEN
T latest_status_list (latest_status)gt lt!ELEMENT
latest_status (ls_date, ls_txt)gt lt!ELEMENT
abstract (PCDATA)gt lt!ELEMENT committee
(PCDATA)gt lt!ELEMENT congressional_record
(PCDATA)gt lt!ELEMENT cosponsor (co_name)gt lt!ELEMEN
T co_name (PCDATA)gt lt!ATTLIST co_name a-date
CDATA IMPLIEDgt lt!ELEMENT date_introduced
(PCDATA)gt lt!ELEMENT ls_date (PCDATA)gt lt!ELEMENT
ls_txt (PCDATA)gt lt!ELEMENT official_title
(PCDATA)gt lt!ELEMENT sponsor (PCDATA)gt lt!ELEMENT
statement_of_purpose (PCDATA)gt lt!ELEMENT
submitted_by (PCDATA)gt lt!ELEMENT submitted_for
(PCDATA)gt
9
Beyond XML DTDs adding knowledge/derived/virtual
information via (logic) rules
  • Data provider says
  • Please archive all records of legislative
    activities of the 106th senate!
  • Integrity constraints, eg
  • (1) senators_with_file UNION (sponsor,
    cosponsors, submitted_by)
  • (2) senators sponsors co-sponsors
  • V iolation
  • the rhs is a SUPERSET of the lhs !
  • Exceptions
  • (Chafee, John), (Gramm, Phil), (Miller, Zell)
  • (Possible) Explanations
  • senators who joined (Zell), passed away (Chafee),
    were forgotten (Gramm)!?
  • Checking ICs
  • IF sponsor(X), not senator(X) THEN
    ADD(exception_log, missing_senator_info(X))
  • IF condition THEN action
  • Action LOG, WARN,
    ABORT, ...

10
OAIS (Open Archival Information System)
Information Model
  • info(rmation)_object data_object
    representation_info
  • data_object digital_object physical_object
  • digital_object bits
  • representation_info structure_info
    semantic_info
  • info_object is_interpreted_using
    representation_info
  • an AIP (archival information package) contains
    content info_objects PDI (preservation
    description information)
  • knowledge-level extension
  • data objects (e.g., RTF/HTML/... formatted
    objects)
  • wrapping/tagginggt information objects (e.g.,
    XML docs DTD/Schema)
  • knowledge extraction/semantic annotationgt
    semantic/conceptual objects (e.g., declarative OO
    model rules)

11
Ingestion Networks
Transformation t is information preserving, if
there is an inverse transformation t_inv, s.t.,
for all d in dom(t) t_inv( t( d ) ) d .
  • asking for "" at the level of raw (unwrapped)
    data may be too strict
  • gt lift to the information level make sure
    information is preserved there
  • e.g., mapping back to HTML using XSL(T) can give
    the same "look and feel" as the raw data but
    presentational HTML "noise" (irregularities) is
    removed

12
Ingestion Network Senate Collection
13
From XML-Based to Knowledge-Based Archives...
  • XML/collection-based archival save data "as is"
    plus...
  • ... separate content from presentation
  • ... tag your data (take a lift in the info
    hierarchy)
  • ... use a self-describing, semistructured data
    format (XML)
  • Knowledge-based archival add ...
  • ... conceptual level information
  • ... integrity constraints
  • ... explanations/derivation rules
  • archiving only results yf(x) vs. archiving the
    rules/function "f" (e.g. f the
    Florida procedure...)
  • knowledge representation (rules) metadata on
    steroids ...

14
... to Self-Validating, Self-Instantiating
Knowledge-Based Archives
  • Goal self-contained archives
  • Limitations how much context can you drag into
    your archive to make it self-contained??
    (Dublin Core the world)
  • Using open, infrastructure independent
    representations...
  • gt make the archive as self-contained as you can
    ...
  • pay for

15
Maximizing Self-Containedness
  • Self-validating archives add ...
  • ... "executable knowledge" (rules)
  • "helping (bugging?) the data provider"
  • gt add the functionality and meaning of DTD
    (SchemaIC...) validation to the AIP
  • gt package the validator!
  • Self-instantiating archives add ...
  • ... "executable ingestion process"
  • helping the archival engineer (aka archivist)
  • here is looking over your shoulder
  • gt add the functionality of database
    transformations to the AIP
  • gt package the transformers!
  • BUT packaging validators and transformers
    increases infrastructure dependence!

16
Towards Bootstrapping Knowledge-Based Archives
  • enable addition of semantic annotations
    ("knowledge") via logic rules to AIPs
  • add executable specifications of semantics
    gt AIP KP (knowledge package,
    i.e., rules)
  • gt self-validating archive
  • add executable specifications of the ingestion
    network gt AIP IN (ingestion network, ...more
    rules)
  • gt self-instantiating archive

  • gt a bootstrapping knowledge-based archive with
    DTD/Schema/IC validation and ingestion
    transformations all expressed in a declarative
    logic program
  • from the 2do list build a prototype (BARON
    Bootstrapping Archive of Rules, Ontologies, and
    ingestion Networks)

Baron von Münchhausen, pulling himself out of the
swamp
17
Getting your hands dirty with logic rules
  • Some logic rules for reassembling the doc
    structure (lexical scopes) from the OAV (or
    rather AOV)

attr_interval(Attr, SID, Attr_val, LN, LN1) -
oav(Attr, (SID, LN), Attr_val),
oav(Attr, (SID, LN1), _), LN1 gt LN,
not attr_between(Attr,SID,LN,LN1).
attr_between(Attr,SID,LN,LN1) -
oav(Attr, (SID, LN), _), oav(Attr, (SID,
LN1), _), oav(Attr, (SID, LN2), _),
LN lt LN2, LN2 lt LN1.
18
Summary what is the declarative (logic) approach?
  • Use of declarative database and knowledge
    representation formalisms for...
  • adding knowledge packages to AIPs
  • capture context known at the time of archival
    using conceptual models of collections, integrity
    constraints, virtual relations,
  • applying them at ingestion (aka bringing-in),
    migration, and instantiation/access time
  • ( wrapping, transforming, querying
    collections)

19
References
  • Towards Self-Validating Knowledge-Based
    Archives, Bertram Ludäscher, Richard Marciano,
    Reagan Moore, 11th Workshop on Research Issues in
    Data Engineering (RIDE), Heidelberg, IEEE
    Computer Society, April 2001, SDSC TR-2001-1,
    January 18, 2001.
  • Knowledge-Based Persistent Archives, Reagan
    Moore, SDSC TR-2001-7, January 18, 2001
  • The Senate Legislative Activities Collection
    (SLA) a Case Study Infrastructure Research to
    Support Preservation Strategies, Richard
    Marciano, Bertram Ludäscher, Reagan Moore, SDSC
    TR-2001-5, January 18, 2001
  • Reference Model for an Open Archival Information
    System (OAIS), Draft Recommendation, Consultative
    Committee for Space Data Systems, CCSDS
    650.0-R-1, May 1999.
  • Digital Rosetta Stone A Conceptual Model for
    Maintaining Long-term Access to Digital
    Documents, Alan R. Heminger, Steven B. Robertson
Write a Comment
User Comments (0)
About PowerShow.com