Title: Building national and largescale Internet Information Gateways
1Building national and large-scale Internet
Information Gateways
- A DESIRE Workshop
- WELCOME!
2Building national and large-scale Internet
Information Gateways
- Introductions and Welcome
- Nicky Ferguson, Institute for Learning and
Research Technology, University of Bristol, UK - Titia van der Werf, National Library of the
Netherlands
3(No Transcript)
4What is an information gateway?
- Emma Place
- Institute for Learning and Research Technology
- University of Bristol, UK
5- The Web is quickly becoming the
- Worlds fastest growing
- repository of data
- Tim Berners-Lee
- W3C Director and creator of the WWW
6People are increasingly ...
- going to the Internet
- before they go to the library
7Librarians are increasingly ...
- taking librarianship out of libraries and onto
the Internet
8Information gateways ...
- doing for Internet
- information resources
- what librarians do for books
9Gateways are an Internet search tool
- to help people find resources on the Internet
eg - electronic journals
- software
- datasets
- electronic books
- mailing lists / discussion groups (and their
archives) - articles / papers / reports
- bibliographic databases
- bibliographies
- organisational home pages
- educational materials
- news
- resource guides
10They offer ...
- Linked collections of Internet resources via a
database of resource descriptions. This can be - browsed - thanks to classification
- searched - thanks to cataloguing
- quality controlled - thanks to selection
11The key ingredient ...
- the semantics that only the
- human factor can bring
- subject specialists
- library / information professionals
12Characteristics of an information gateway
- An online service providing links to Internet
resources with - semantic selection
- semantic description
- semantic classification
- at least part semantic cataloguing
- (Traugott Koch, Netlab)
13What gateways are NOT!
- Internet
- search engines
- eg. Altavista or Excite
14What gateways are not 2!
- Web directories
- eg. Yahoo /
- The Open Directory
15What gateways can be ...
- Virtual libraries involving
- distributed teams of librarians
- distributed databases that can be
cross-searched
16Some national gateway initiatives in Europe
- UK Resource Discovery Network
- The Netherlands - DutchESS
- Finland - Finnish Virtual Library Project
- Germany - Virtual subject libraries
- France - les Signets
17 A guided tour ...
- SOSIG
- The Social Science Information Gateway
- http//www.sosig.ac.uk/
- DutchESS
- http//www.konbib.nl/dutchess/
18URLs
- SOSIG Scope Policy
- http//www.sosig.ac.uk/desire/escope.html
- SOSIG Selection Criteria
- http//www.sosig.ac.uk/desire/ecrit.html
- DutchESS Manual (in Dutch only)
- http//www.konbib.nl/dutchess/manual/
19(No Transcript)
20Information gateways
- 49 reasons for National Libraries to be cheerful
-)
Nicky Ferguson Institute for Learning and
Research Technology University of Bristol, UK
21Why Gateways at all ?
- a familiar place - a community centre
- intermediaries have always been important
- subject focus leads gently
22Why Gateways at all ?
- many users are inexpert users
- browsing serendipity
- searching precision
- both quality
23Why libraries ?
- the natural metaphor
- browsing, reference desk
- expertise in relevant areas
- classification, acquisition, keywords
- information seeking behaviour
- guiding and helping users
- who else will do it better ?
24Why libraries ?
- the natural metaphor
- browsing, reference desk
- expertise in relevant areas
- classification, acquisition, keywords
- information seeking behaviour
- guiding and helping users
- who else will do it better ?
- the electronic librarian !
25Why National Libraries ?
- too much for one institution
- too much for one country
- the influence to collaborate externally
- and to co-ordinate internally
- eg The Finnish Virtual Library
26Benefits, national
- save the cost of duplicated national effort
- in academic/public libraries and elsewhere
- spotlight on nationally funded research
- trade and business attracted as a result
- national profile increased
- by international collaboration
27Benefits, library
- leading the way into the information age
- communicating with non-nerds
- access to huge high quality collections
- at lower cost than creating them
- integrate into existing structures
28Benefits, users
- diverse resources brought together
- research, learning, leisure, enrichment
- also brought together
- access for a far wider population
- someone to ask
- whats where ? whats what ? whats good ?
29Benefits of collaboration
- access to many countries efforts
- much work done on
- standards - technical and information
- rules and procedures
- formats, consistency
- quality controls and quality standards
30The Ideal
- only create records for one nations resources
- access records from all nations
- cross-searching
- across discipline
- across subject
- across language
31The Message
- go forth and multiply (your gateways)
32(No Transcript)
33Information Gateways in perspective
- Rachel Heery
- UKOLN The UK Office for Library and Information
Networking, University of Bath
34Information gateways in perspective summary
- Information gateways as part of the resource
discovery landscape - spectrum of resource discovery initiatives
- variety of service models
- information gateways and metadata
- variety of metadata creation models
- metadata v. cataloguing
- collaboration and integration
- setting the stage (ROADS, DESIRE)
- expansion of services
- developing common approaches
35Spectrum of resource discovery
- Selective services
- targeted coverage
- explicit selection policy
- value added description
- RDN (eLib) gateways
- Nordic Web Index
- Dutchess
- GEM etc
- total services
-
- complete coverage
- business driven selection
- shallow description
- Alta Vista
- Google
- Yahoo etc
-
36Characteristics of selective services
- breadth of coverage
- (selection criteria)
- quality of resource
- by subject area
- by region
- target audience
- depth of subject description
- hand crafted
- metadata aware harvesting
- use of standard classification schemes
- authority files applied
37Metadata creation
- Who creates metadata?
- authors
- experts
- metadata creation agencies
- Where is the metadata?
- embedded in resource
- local on site database
- third party databases
38Metadata creation aphorisms
- do work as near source as possible
- do it once, do it right!
- but need to consider benefit of
- pattern of enhancement, incremental approach
39Metadata creation collaboration
- working with information providers
- linking libraries and publishers
- BIBLINK
- co-operation between libraries
- Intercat
- OCLC CORC project
- enhancing harvested metadata
40Description of BIBLINK Workspace
Publishers
BIBLINK Workspace A shared facility for storing
and manipulating BIBLINK workspace records
Third parties eg Identification agencies - ISBN,
ISSN, etc.
BIBLINK Workspace Administrator
National Bibliographic Agencies
15
41Shared approaches
- compatible technical solutions
- shared semantics (common metadata sets)
- shared syntax (HTML, RDF/XML )
- consistency of content (cataloguing rules)
42Support activities
- ROADS
- DESIRE
- IMesh
- Range of associated information gateways
- DutchESS
- Finnish Virtual Library project
- EELS
- NOVAGate
- SOSIG
- Internet Scout
- . etc
43(No Transcript)
44Information Gateways and the international
perspective
- Marianne Peereboom, KB, The Netherlands
- Dan Brickley, ILRT, UK
45Why co-operate?
- enhancing Internet resource discovery for
end-users - access to much broader collections than any
single gateway could offer, including high
quality Internet resources on many subjects, from
many countries, in many languages - access to a large number of metadata records via
a single, user-friendly interface - the ability to locate new gateways that they may
not have heard about - the possibility to search a selection of gateways
- simultaneously as opposed to one by one
46Why co-operate?
- improving the efficiency and sustainability of
gateway services - use established technologies, methods and
practices - and avoid starting from scratch - divide responsibilities for creating or sharing
metadata records - and avoid duplication of
effort - combine effort for technical development - and
avoid repetition of work and errors - create joint publicity, training and promotion
- share staff effort (management/technical/
administrative/cataloguing) - create shared strategies for long-term
sustainability
47Why not?
- political or funding issues
- competition instead of co-operation?
- Safeguard own identity, position in the market
place - possible disadvantages of co-operation
- co-operation can incur extra expense
- intellectual property rights
- agreeing on aims and objectives
48 Models for co-operation
- co-operative agreements for metadata records
- creation of metadata records
- use of metadata records
- building integrated services
- pointing to other services
- mirroring other services
- cross searching / cross browsing
- integrated interface
- customized interfaces to one collection of
records - interoperability
49Interoperability
- being able to search, browse and retrieve
information from distributed gateways based on
(broadly) the same technologies, protocols and
metadata formats - being able to search, browse and retrieve
information form distributed gateways based on a
variety of software solutions, search-retrieve
protocols and metadata formats
50Standards
- search and retrieve (or indexing) protocols
- Z39.50, Whois, LDAP
- metadata formats
- cataloguing standards
- subject indexing schemes
51Key initiatives
- ROADS
- software and standards for developing information
gateways which can be cross-searched with any
other gateway - http//www.ilrt.bris.ac.uk/roads/
- DESIRE
- European project which aims to promote the
development of the gateway model in Europe - http//www.desire.org/
52Key inititatives
- ISAAC
- a research project of Internet Scout in the USA.
- aim to create an architecture that enables
repositories of metadata records to be cross
searched - http//scout.cs.wisc.edu/research/index.html
53Key initiatives (2)
- iMesh
- informal discussion forum to promote
collaboration amongst information gateways - http//www.desire.org/html/subjectgateways/communi
ty/imesh/ - iMesh Toolkit
- project fundes by National Science Foundation
(USA) and JISC (UK) to develop architecture
toolkit for distributed - subject gateways. (building on ROADS and ISAAC)
54Reynard project
- European project 5th Framework programme
- Duration Jan 2000 - June 2002
- Aims
- 1. to provide a one-point-access to, and to aim
at a consistent presentation of national
subject-services in Europe - 2. to exploit existing services by way of
creating a shared test environment within which
national initiatives will experiment with
co-operative efforts, devise models for sharing
metadata, agree on technical solutions and
short-cyclic innovation, develop - business models and foster standardisation
activities
55Reynard partners
- national libraries and national resource
discovery network organisations - research libraries who have acquired expertise in
different areas of subject gateway development - library related technology centres and university
computer centres
56Demonstrations
- http//www.desire.org/html/research/demonstrations
/
57(No Transcript)
58Panel Session
- Nicky Ferguson, ILRT (Chair)
- Eric Miller, OCLC
- Toini Alhainen, Finnish Virtual Library Project
- Titia van der Werf, National Library of the
Netherlands - Rachel Heery, UK Office for Library and
Information Networking - Debra Hiom, The Social Science Information
Gateway - Traugott Koch, Electronic Engineering Library,
Sweden
59Questions for the panel
- 1) Why give Internet resources different
treatment in cataloguing? (eg why use metadata
such as Dublin Core rather than MARC or ISBD-ER
and why catalogue resources into something
separate from the library OPAC?) - 2) What are the key strengths and weaknesses of
the information gateway approach? - 3) How far should there be a national strategy
and is there one in your country? - 4) What lies in the future? Is creating
national information gateways a sound foundation
for future developments? - (eg in the light of forthcoming technologies
/ - metadata formats and possible international
- collaborations)
60(No Transcript)
61Sustaining resource description
- Lorcan Dempsey
- UKOLN The UK Office for Library and Information
Networking, University of Bath
62(No Transcript)
63Setting up a gateway- practical issues
- Emma Place
- Institute for Learning and Research Technology
- University of Bristol, UK
64Three parts ...
- 1) Overview - Emma Place
- 2) Information issues - Marianne Peereboom
- 3) Technical issues - Paul Hollands
65Coming Soon ...
- The DESIRE
- Information Gateways Handbook
- www.desire.org
66So what do you need to set up a gateway?
67Basic ingredients
-
- money
- people
- equipment
- time
68Phases in a gateway project
- 1) planning
- 2) set-up
- 3) building your collection
- 4) running the service
- 4) ongoing maintenance / development
- 5) adding new features
- 6) managing a mature gateway
69(1) Planning
- what is ideal vs. what is possible
- money / resources
- strategy, aims, objectives
-
70Scoping
- Components
- target audience
- Your decisions!
- Scope Policy
- Selection Policy
71Staff and skills required
- Skills needed
- subject expertise
- information
- technical
- interface design
- training / publicity
- management
- Your decisions!
- central staff
- distributed staff
72System requirements
- What you need
- network connectivity
- hardware
- Web server software
- database and software
- PCs and materials for staff
- Your decisions!
- standard gateway software
- your own system
73(2) Set up
- Components
- database
- user interface
- admin interface
- records
- Your decisions!
- metadata formats
- classification scheme
- tools and guidelines
- cataloguing rules
74(3) Building your collection
- finding resources
- selecting resources
- describing resources
- ensuring quality, consistency and coverage
75(4) Running the service
- building a user community
- publicity and promotion
- user support and training
- announcing Whats New
- day-to-day management
76(5) Ongoing maintenance
-
- collection management
- link checking
- editing records
- updating resource descriptions
- server integrity and functionality
77(6) Adding new features
- a harvested index
- thesaurus feature
- primary content
- community areas
- cross-search features
- user profiles / personalised views
- mirrors
78(7) Mature gateways
- scalability issues
- collaboration
- displaying larger collections
- rising maintenance costs
- upgrading the system
- future proofing
- hardware
- software
- content
79- So thats the overview
- - now lets
- get down to detail ...
80Information Issues
- Marianne Peereboom
- Koninklijke Bibliotheek
81Workflow for information staff
- selection of resources
- cataloguing of resources
- editing and adding resources to database
- housekeeping maintenance of resources
82Selection
- Resources for the gateway will be selected by
skilled staff (subject specialists, librarians,
information specialists). Their value judgement
will be guided by - Scope policy
- Which type of resources will be included in the
catalogue - Quality criteria
- Criteria to judge whether a resource that falls
within the scope of the gateway is of
sufficiently high quality
83Selection policy
- helps users to appreciate that the service is
selective and quality controlled - helps users to understand what type of quality
information they will find when using the service - ensures consistency of selection by individual
staff members - ensures consistency of selection among members of
a (distributed) team - can be used in training new staff
84Scope policy
- First identify
- target user group
- the information needs of the user group
- aims and objectives of the gateway
- balance what you would like to cover with what
you have the resources to cover
85Scope policy
- Metadata and cataloguing
- granularity
- resource description
- Geographical issues
- geographical restraints
- language
- Information coverage
- subject matter
- acceptable types of resource
- acceptable sources
- acceptable level of difficulty
- advertising
- Access
- cost
- technology
- registration
- special needs
86Selection criteria
- Content criteria
- validity
- authority and reputation of source
- accuracy
- comprehensiveness
- uniqueness
- composition and organisation
- currency, and adequacy of maintenance
- Form criteria
- ease of navigation
- provision of user support
- use of recognised standards
- appropriate use of technology
- aesthetics
- Process criteria (the system)
- information integrity (info provider)
- site integrity (webmaster)
- system integrity (systems administrator)
87Examples
- Make you scope policy and selection criteria
available for your users, so they will know what
to expect - Scout report
- http//scout.cs.wisc.edu/report/sr/criteria.html
- EELS Engineering Electronic library
- http//www.ub2.lu.se/eel/qualcrit.html
- SOSIG
- http//sosig.ac.uk/desire/ecrit.html
88Resource description
- To be able to create and maintain resource
descriptions you need - a metadata format
- cataloguing rules
- database maintenance tools
89Resource description - types of data
- A resource description will record different
types of information - bibliographic-type descriptive information
- author, title, publisher, location etc.
- subject information
- classification code, keywords, thesaurus terms
- administrative metadata
- record creation date, intellectual property,
- individuals responsible for selection,
cataloguing, etc.
90Metadata
- Types of metadata formats
- 1 relatively unstructured data automatically
extracted from resources and indexed for use by
robot-based Web services - 2 structured formats simple enough to be
created by non-specialist users. Usually manually
created, but some data may be extracted
automatically. - Examples ROADS, Dublin Core
- 3 specialised formats, developed to organise
complex relations between objects or collections
of objects and are often based on implementations
of SGML - Examples TEI header, MARC
91Metadata format
- Considerations when choosing a metadata format
- which minimum set of fields do you need to enable
the modes of access/search functionality you want
to provide? - do you want to enable interoperability with other
services in the future? Possibilities for
conversion. - are there any conventions in your subject
community? - does the software you have chosen for your
service impose restrictrions on the format you
may use? - who is going to do the cataloguing?
92Dublin Core (DC)
- Content
- Title
- Subject
- Description
- Source
- Relation
- Coverage
- Type
- Intellectual property
- Creator
- Publisher
- Contributor
- Rights
- Instantiation
- Date
- Language
- Format
- Identifier
93DC Examples
- EdNA Education Network Australia
- http//www.edna.edu.au/EdNA/
- Combination of DC and local EdNA elements
- AGRIGATE Agriculture Information Gateway
(Australia) - http//www.agrigate.edu.au/index.html
- Overview of Metadata Fields
94A simple format DutchESS
- Administrative
- Library subject specialist code
- Record creation date
- Record last verified date
- Record last update date
- Elements
- Mandatory
- Title
- BC code (classification)
- URL
- Annotation (in English)
- Optional
- Author
- Identification (not URL, e.g. ISSN)
95DutchESS format
- Characteristics
- home grown....
- mappings ROADS (future DC?)
- easy to maintain, cataloguing by subject
specialist possible - no need for skilled
cataloguers - functionality restricted by simple format and
cataloguing rules
96More complicated.... ROADS
- ROADS offers a complete solution to setting up a
gateway - with metadata templates included - has different templates for different types of
resources e.g. software, document, service,
mailarchive - some templates have up to 80 fields but gateways
create minimum sets involving fewer - easily converted to other formats (MARC/Dublin
Core) - increaes functionality
97Providing subject access
- Classification
- describing the broad subject area or discipline a
resource belongs to - used to group documents in well defined subject
areas - Keywords
- give more detailed description of individual
document - used as a searching aid
- Thesauri
- controlled vocabulary with defined (hierarchical)
relationships between terms - structured search for relevant term by indexer
and user possible
98Classification
- Types of schemes
- universal scheme (DDC, UDC) - example CyberDewey
- national scheme (BC, SAB) - example DutchESS
- subject specific scheme (Ei, NLM) - example
EELS, - Finnish Virtual library
- home grown (Yahoo!)
- Main advantages
- good basis for browsing structure
- multilingual access possible
- interoperability (cross browsing)
99Keywords
- Useful as an extra search aid
- Uncontrolled keywords
- problems with different spellings,
(near-)synonyms - Controlled
- general (LCSH Library of Congress Subject
Headings) - subject specific (MESH Medical Subject
Headings) - user will need access to the vocabulary to be
able - to find the right term
100Thesauri
- Semantic relations between terms defined
- Broader term
- Narrower term
- Top term
- Related term
- Preferred term
- Non-preferred term
- Use
- to aid users to find the relevant term (SOSIG)
- as a basis for the browsing structure, in place
of a classification scheme (OMNI)
101Cataloguing a resource
- Examples
- HTML form to submit a resource for DutchESS
- http//www.konbib.nl/nbw-cgi/usr/nbw_aanmeldform.p
l - Cataloguing template used in SOSIG
- http//www.ukoln.ac.uk/metadata/roads/templates/
102Housekeeping
- Identify tasks and the staff responsible for
them - Validating records to ensure that the record is
accurate - Link checking records to ensure that resources
are available - Updating resource descriptions to ensure that the
record still adequately reflects the content of
the resource - Maintenance tool
- to enable appointed staff to add, edit, remove
records provide access to link checker output
etc. - DutchESS Maintenance tool
103Staff support
- Provide training
- Face to face workshops
- Online training
- Provide online documentation
- DutchESS Manual
- http//www.konbib.nl/dutchess/manual/
- SOSIG Admin Centre
- http//sosig.ac.uk/admin/section-editors/
- (password protected)
104Technical issues
- Paul Hollands
- Institute for Learning and Research Technology
- University of Bristol, UK
105Components of a subject gateway architecture
- Requirements for your back-end database
- nested boolean and fielded searching
- truncation / stemming
- browseable indexes
- ranking
- stored searches
- batched results
- flexibility is important.database format is not
106Components - Web interfaces 1
- Front of house
- search forms (simple and advanced)
- results with a variety of format options
- browseable directory style subject listings
- forms to suggest new entries
- personalised portal style my gateway
interfaces
107Components - Web interfaces 2
- Back office / administration interfaces
- metadata creation / cataloguing
- database administration
- edit and delete records and authority lists
- managing indexing (immediate and deferred)
- link checking
- handling new submissions
- checking for duplicates and database integrity
108Interoperability
- can you accept queries from and send results back
to clients using - WHOIS ?
- Z39.50 ?
- SQL ?
- LDAP ?
- can you generate Centroids to become a part of a
cross-searchable mesh of servers? - can you produce results in a range of metadata
formats?
109Other tools
-
- harvested databases
- automated cataloguing
- user profiles
- thesauri
110Open source
- do others have the opportunity to build on your
work to the benefit of you both? - will you have to pay a developer to make changes
to the functionality of your system rather than
do it yourself? - will you have to wait for your developer to
produce bug fixes your staff could make
themselves given the source code?
111ROADS gateway toolkit
- www.ilrt.bris.ac.uk/roads
- www.opensource.ac.uk
112(No Transcript)
113Discussion/Surgery
- 1. technical issues
- 2. information management
- 3. organisational and day-to-day management
- 4. funding and business models
114Closing Address
- Nicky Ferguson
- Institute for Learning and Research Technology
- University of Bristol, UK
115Lunch... then Surgery time