Title: Archiving Web Resources Information Day
1Archiving Web ResourcesInformation Day
- Xinq XML Inquire
- search/browse access to archived databases
- Monica Berko, NLA
- 12th November, 2004
2Why was Xinq developed
- our method for archiving a deep web site has been
to acquire the back-end database and transform
the content to XML - we now have XML file(s) and possibly a schema
- how do we provide access to the content that
emulates the original interface on the live web
site
3Examples
- Health Education Rural Remote Resources Database
- Plant Breeders Rights Database
- Australian Medical Pioneers Index
- Australia Dancing
- The Last Resting Place of Australian Artists
- Soldiers of the South African War
4Problems to be solved
- How to describe the data model and semantics of
the deposited database - What administrative metadata is required
- How to describe the behaviour of the access
interface - Select a platform for building web interfaces to
XML data stores that is open-source, popular,
easily deployed and scalable. - How to automatically generate a web-based
search/browse interface to an arbitrary XML data
archive in this platform
5Archival Information Package
- XML configuration file storing the required
administrative information, data model
description, and access/display rules - XML Schema for the Archive Contents
- The archive contents (XML only)
6Administrative Metadata
- Descriptive Title, Description, Publisher, Live
URL - If the database describes digital objects which
are also to be fetched and archived the field(s)
containing URI reference must be specified - Extraction details about original database and
the filtering, mapping processes used to extract
data to be archived - Ingestion details about ingestion of the XML
archive into the repository and also
corresponding digital objects if there are any
7Describing the Data Model
- An XML schema is difficult for non-technical
staff to author - Semantic information is required for reproducing
a usable web interface - The data model must cope with multi-item models
and item relationships - The description must be expressed as XML
- The Xinq tool generates an XML schema for each
item described in the data model
8Single Item and Multi-Item Sites
- Many dynamic web sites describe only one entity
- Some sites are obviously based around multiple
but related entities and the database archive
description will need to reflect this egHealth
Education Rural Remote Resources Database
9Arbitrary database model
10Simple Multi-Item Example
11Example Data Model Definition
- http//www.nla.gov.au/xinq/documents/examples/pub2
_archive-spec.xml
12Data Model definition schema
13(No Transcript)
14The Access Interface
- a home page with a search form
- a search results display page for each item type
- a detailed display page for all the properties of
an item - browse options which mimic the browse options
available on the original site - a default header and footer file which includes
the name of the publisher and the url of the
original site
15Example
- Health Education Rural Remote Resources Database
- Live Site
- Archived Site
16Describing the access interface
- Search rules
- Browse rules
- Display rules
17Search Rules
18Search rules example
ltsearch_rulesgt ltsearchgt
ltentitygtpublicationlt/entitygt
ltfieldgttitlelt/fieldgt
ltfieldgttypelt/fieldgt
ltfieldgtauthorltsubfieldgtfamilynamelt/subfieldgtlt/fiel
dgt ltfieldgtauthorltsubfieldgtgivennamelt/s
ubfieldgtlt/fieldgt ltfieldgtauthorltsubfiel
dgtdeceasedlt/subfieldgtlt/fieldgt lt/searchgt
ltsearchgt ltentitygtpersonlt/entitygt
ltfieldgtfamilynamelt/fieldgt
ltfieldgtgivennamelt/fieldgt
ltfieldgtdeceasedlt/fieldgt
ltfieldgtbirthdatelt/fieldgt lt/searchgt
lt/search_rulesgt
19Browse Rules
20Browse Rules Example
ltbrowse_rulesgt ltbrowsegt ltentitygtpublication
lt/entitygt ltfieldgttitlelt/fieldgt
ltfieldgttypelt/fieldgt ltfieldgtauthor
ltsubfieldgtfamilynamelt/subfieldgt lt/fieldgt
lt/browsegt lt/browse_rulesgt
21Display Rules
22 ltdisplay_rulesgt ltresultsummarygt
ltentitygtresourcelt/entitygt ltresults-stylegttabl
elt/results-stylegt ltfield sortorder"ascending
" link"true"gttitlelt/fieldgt
ltfieldgtproviderltsubfieldgtnamelt/subfieldgtlt/fieldgt
ltfieldgtproviderltsubfieldgtstatelt/subfieldgtlt/fie
ldgt lt/resultsummarygt ltresultsummarygt
ltentitygtproviderlt/entitygt ltresults-stylegtlist
lt/results-stylegt ltfield sortorder"ascending"
link"true"gtnamelt/fieldgt ltfieldgtstatelt/field
gt ltfieldgtcontactlt/fieldgt
ltfieldgtphonelt/fieldgt ltfieldgtemaillt/fieldgt
lt/resultsummarygt lt/display_rulesgt
23More Examples
- Plant Breeders Rights Database
- Australian Medical Pioneers Index
- The Last Resting Place of Australian Artists
- Soldiers of the South African War
24Required Infrastructure
- Native XML database server which supports XQuery
and XMLDB API (eXist and Tamino have been
tested) - Java servlet container(Tomcat and Jetty have
been tested) - Apache Ant
- Xalan XSLT processor
25Limitations of the tool
- Does not validate item relationships
- Does not deal with nested property groups
- Does not yet properly handle nested references
- Archive description file needs to be authored
from scratch by the curator - Has limited free text search capability
- Has no advanced search interface
- Has no map interface for querying by physical
location - Not integrated with the archival of digital
objects referenced in the database
26Roadmap
- Release on SourceForge February 2005
- Some architectural and performance improvements
- Develop Wizard-style tool for generating archive
description file - Integration with digital object archives
referenced by the data - Improved handling of nested property groups and
item relationships - Advanced search
- More flexible configuration of free text
searching rules
27Alternative uses for tool
- For mothballed systems, contents of legacy
database can be archived as XML and then Xinq can
generate online search and browse capability. - Prototyping tool for requirements analysis
- Related tool development Xedit generic online
update capability based on the same database
description configuration file
28More Information
- Project Page
- http//www.nla.gov.au/xinq
- Source Forge Entry
- http//sourceforge.net/projects/xinq/