Delivering MARC/XML records - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Delivering MARC/XML records

Description:

Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike_at_indexdata.com – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 50
Provided by: orgu1269
Category:

less

Transcript and Presenter's Notes

Title: Delivering MARC/XML records


1
Delivering MARC/XML records from the Library of
Congress catalogue using the open protocols SRW/U
and Z39.50 Mike Taylor, Index Data mike_at_indexdata
.com
2
Overview
  • Where we're headed in the next half-hour
  • Existing standards for library catalogues
  • The new XML equivalents of these standards
  • Providing XML access to existing catalogues
  • Two services running from two databases
  • Two services running from a single database
  • New gateway running over the existing service
  • The Library of Congress's solution

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
3
Existing standards for catalogues
  • The value of existing standards is well
    understood
  • MARC (MAchine Readable Catalogue) records
  • ISO 2709 (interchange format for MARC)
  • ANSI/NISO Z39.50 (search and retrieve on the
    Internet)
  • These standards allow interoperability and
    co-operation
  • between libraries that other fields can only
    dream about.
  • (Librarians don't know how lucky they are!)

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
4
Z39.50 for searching catalogues
Z39.50 client
Z39.50 (fetching MARC records)
Library of Congress Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
5
Z39.50 for searching catalogues
Z39.50 client
Z39.50
Library of Congress Z39.50 server
British Library Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
6
Z39.50 for searching catalogues
Z39.50 client
Z39.50
Library of Congress Z39.50 server
British Library Z39.50 server
Local catalogue Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
7
Z39.50 for searching multiple catalogues
Metasearching Z39.50 client
Z39.50
Z39.50
Z39.50
Library of Congress Z39.50 server
British Library Z39.50 server
Local catalogue Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
8
Trouble in paradise
Then the serpent saith unto Adam, Lo, why doth
thy catalogue service not use XML? And Adam
saith, Verily, Z39.50 worketh just fine. But
the serpent, who was subtle of tongue, saith unto
him, But XML is more fashionable. And, behold,
Adam was deceived, and did fall. -- The Book of
Standards, ch. 3, v. 4-6.
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
9
Welcome to the 21st Century
Everything must be XML
Metasearching Z39.50 client
Z39.50
Z39.50
Z39.50
Library of Congress Z39.50 server
British Library Z39.50 server
Local catalogue Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
10
Welcome to the 21st Century
Resistance is useless!
Metasearching Z39.50 client
Z39.50
Z39.50
Z39.50
Library of Congress Z39.50 server
British Library Z39.50 server
Local catalogue Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
11
Catalogue standards in an XML world
The binary USMARC format is superseded by
MARCXML. As many of the original developers of
Dublin Core were Americans, various parochial
national standards were referenced. This will
hopefully get fixed with the belated discovery of
the rest of the planet. (Unattributed,
sadly.) Enter MarcXchange, a MARCXML superset
that can represent all the national MARC formats
(DANMARC, etc.) (Though repairing MARCXML might
have been better.)
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
12
Catalogue standards in an XML world
The binary Z39.50 protocol is superseded by
SRU. (Search/Retrieve by Url). This is a
NISO-registered standard for expressing queries
using rich URLs, to obtain XML responses that
contain records matching the query. http//sru.mi
ketaylor.org.uk/sru.pl? version1.1 operation
searchRetrieve querydinosaur startRecord1
maximumRecords1 recordSchemadc
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
13
An SRU response (single DC record)
lt?xml version"1.0"?gt ltzssearchRetrieveResponse
xmlnszs'http//www.loc.gov/zing/srw/'gt
ltzsversiongt1.1lt/zsversiongt ltzsnumberOfRecords
gt29lt/zsnumberOfRecordsgt ltzsrecordsgt
ltzsrecordgt ltzsrecordSchemagtinfosrw/schema
/1/dc-v1.1lt/zsrecordSchemagt
ltzsrecordPackinggtxmllt/zsrecordPackinggt
ltzsrecordPositiongt1lt/zsrecordPositiongt
ltzsrecordDatagt ltsrw_dcdc
xmlnssrw_dc"infosrw/schema/1/dc-schema"
xmlns"http//purl.org/dc/elements/1.1
/"gt lttitlegtFossilslt/titlegt
ltcreatorgtLappi, Megan.lt/creatorgt
lttypegttextlt/typegt ltpublishergtNew York,
NY Weigl Publisherslt/publishergt
ltdategt2005lt/dategt ltlanguagegtenlt/language
gt ltdescriptiongtStudying fossils --
Fossil facts -- Gone forever -- A
fossil is born -- From bone to stone --
Insects in amber -- Dinosaur footprintslt/descrip
tiongt ltidentifiergthttp//www.loc.gov/c
atdir/toc/ecip0415/2004004136.htmllt/identifiergt
ltidentifiergtURNISBN1590362136lt/identifie
rgt lt/srw_dcdcgt lt/zsrecordDatagt
lt/zsrecordgt lt/zsrecordsgt lt/zssearchRetrieveR
esponsegt
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
14
An SRU response (single DC record)
lt?xml version"1.0"?gt ltzssearchRetrieveResponse
xmlnszs'http//www.loc.gov/zing/srw/'gt
ltzsversiongt1.1lt/zsversiongt ltzsnumberOfRecords
gt29lt/zsnumberOfRecordsgt ltzsrecordsgt
ltzsrecordgt ltzsrecordSchemagtinfosrw/schema
/1/dc-v1.1lt/zsrecordSchemagt
ltzsrecordPackinggtxmllt/zsrecordPackinggt
ltzsrecordPositiongt1lt/zsrecordPositiongt
ltzsrecordDatagt ltsrw_dcdc
xmlnssrw_dc"infosrw/schema/1/dc-schema"
xmlns"http//purl.org/dc/elements/1.1
/"gt lttitlegtFossilslt/titlegt
ltcreatorgtLappi, Megan.lt/creatorgt
lttypegttextlt/typegt ltpublishergtNew York,
NY Weigl Publisherslt/publishergt
ltdategt2005lt/dategt ltlanguagegtenlt/language
gt ltdescriptiongtStudying fossils --
Fossil facts -- Gone forever -- A
fossil is born -- From bone to stone --
Insects in amber -- Dinosaur footprintslt/descrip
tiongt ltidentifiergthttp//www.loc.gov/c
atdir/toc/ecip0415/2004004136.htmllt/identifiergt
ltidentifiergtURNISBN1590362136lt/identifie
rgt lt/srw_dcdcgt lt/zsrecordDatagt
lt/zsrecordgt lt/zsrecordsgt lt/zssearchRetrieveR
esponsegt
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
15
An SRU response (single DC record)
lt?xml version"1.0"?gt ltzssearchRetrieveResponse
xmlnszs'http//www.loc.gov/zing/srw/'gt
ltzsversiongt1.1lt/zsversiongt ltzsnumberOfRecords
gt29lt/zsnumberOfRecordsgt ltzsrecordsgt
ltzsrecordgt ltzsrecordSchemagtinfosrw/schema
/1/dc-v1.1lt/zsrecordSchemagt
ltzsrecordPackinggtxmllt/zsrecordPackinggt
ltzsrecordPositiongt1lt/zsrecordPositiongt
ltzsrecordDatagt ltsrw_dcdc
xmlnssrw_dc"infosrw/schema/1/dc-schema"
xmlns"http//purl.org/dc/elements/1.1
/"gt lttitlegtFossilslt/titlegt
ltcreatorgtLappi, Megan.lt/creatorgt
lttypegttextlt/typegt ltpublishergtNew York,
NY Weigl Publisherslt/publishergt
ltdategt2005lt/dategt ltlanguagegtenlt/language
gt ltdescriptiongtStudying fossils --
Fossil facts -- Gone forever -- A
fossil is born -- From bone to stone --
Insects in amber -- Dinosaur footprintslt/descrip
tiongt ltidentifiergthttp//www.loc.gov/c
atdir/toc/ecip0415/2004004136.htmllt/identifiergt
ltidentifiergtURNISBN1590362136lt/identifie
rgt lt/srw_dcdcgt lt/zsrecordDatagt
lt/zsrecordgt lt/zsrecordsgt lt/zssearchRetrieveR
esponsegt
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
16
SRU's big brother SRW
  • SRU works by fetching rich URLs.
  • SRW (Search/Retrieve Webservice) works over SOAP.
  • In theory, SRW is more powerful and flexible than
    SRU.
  • In practice, it is hard to implement and runs
    more slowly.
  • It is still important because many Big Players
    (Microsoft,
  • IBM, etc.) have a big investment in SOAP.
  • However, most implementations have used SRU.
    With
  • HTTP/1.1 persistent connections, performance is
    fine.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
17
SRU's query language CQL
CQL (Common Query Language) is used by SRU and
SRW. It may also be used in other contexts
(including Z39.50). Its syntax is easy to learn,
but very expressive. dinosaur titledinosaur titl
e(dinosaur or pterosaur) and authormartill dc.ti
tlesaur and dc.authormartill title exact "the
complete dinosaur" and date lt 2000 name/phonetic
"smith" fish prox/distancelt3/unitsentence frog
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
18
Now what?
  • We have
  • A mature, functional infrastructure based on MARC
    and Z39.50
  • A world out there that is comfortable with
    XML-based technology
  • An XML-based equivalent of MARC
    (MARCXML/MarcXchange)
  • An XML-based equivalent of Z39.50 (SRU)
  • But we don't have
  • Actual running SRU servers that deliver MARCXML
    records.
  • Can we get there from here?

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
19
Server providers don't want to switch
Z39.50 client
Z39.50
Uh-oh!
Library of Congress SRU server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
20
Client applications don't want to switch
SRU client
SRU
Uh-oh!
Library of Congress Z39.50 server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
21
Transition period run both services
Z39.50 client
SRU client
Z39.50
SRU
Library of Congress Z39.50 server
Library of Congress SRU server
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
22
Transition period run both services
  • This approach gives client applications a choice
  • Existing client applications continue to work
  • New applications can be built using new
    technology
  • This flexibility comes at a cost to the service
    providers,
  • who have to provide not one but two services.
  • How can they do this? There are three approaches.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
23
(No Transcript)
24
Why the two-database approach sucks
  • The two-database has the advantage of conceptual
    and
  • operational simplicity. The two separate systems
    can be
  • maintained by separate teams.
  • However THE TWO DATABASES HAVE TO BE KEPT
  • SYNCHRONISED.
  • At best this entails duplication of effort.
  • At worst, it fails completely, and a record fetch
    from one
  • database may be different from the same record
    fetched
  • from the other database. (If it exists at all.)

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
25
(No Transcript)
26
Advantages of the 1D2S approach
When both services use data from the same
database, only one copy of the database has to be
maintained. This approach has several advantages
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
27
Advantages of the 1D2S approach
  • When both services use data from the same
    database,
  • only one copy of the database has to be
    maintained.
  • This approach has several advantages
  • Eliminates duplication

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
28
Advantages of the 1D2S approach
  • When both services use data from the same
    database,
  • only one copy of the database has to be
    maintained.
  • This approach has several advantages
  • Eliminates duplication
  • Reduces redundancy

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
29
Advantages of the 1D2S approach
  • When both services use data from the same
    database,
  • only one copy of the database has to be
    maintained.
  • This approach has several advantages
  • Eliminates duplication
  • Reduces redundancy
  • Reduces redundancy

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
30
Advantages of the 1D2S approach
  • When both services use data from the same
    database,
  • only one copy of the database has to be
    maintained.
  • This approach has several advantages
  • Eliminates duplication
  • Reduces redundancy
  • Reduces redundancy
  • Eliminates duplication

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
31
The horrible truth
When the database (and Z39.50 server) are part of
an integrated proprietary system, the SRU server
runs into a brick wall.
Library of Congress Z39.50 server
Library of Congress SRU server
No API!
Proprietary database
Opaque black box
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
32
The solution
Z39.50 IS the API!
Library of Congress Z39.50 server
Library of Congress SRU server
Black box with a little hole
Proprietary database
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
33
Why this is so cute
  • When the SRU server uses Z39.50 as its API to the
    database,
  • it is an SRU-to-Z39.50 gateway. Its front-end is
    an SRU
  • server and its back-end is a Z39.50 client.
  • This rocks because
  • No duplication of data is necessary
  • No co-operation is necessary from the existing
    software
  • Use of the standard Z39.50 protocol as the API to
    the
  • database means that THE SAME GATEWAY can be
  • used to provide SRU access to ANY CATALOGUE
  • that is already available via Z39.50.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
34
A novel application of Z39.50
Z39.50 is most often used to allow a client to
query a remote server. Here we are using it as a
tightly integrated part of a locally provided
service -- the gateway will typically run on the
same machine as the Z39.50 server, or on
a nearby machine on the same LAN. HOWEVER,
because Z39.50 is a network API rather than a
link-time API, other interesting arrangements are
possible.
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
35
Typical architecture integrated SRU
SRU client
SRU
Library of Congress Z39.50 server
Library of Congress SRU server
Proprietary database
Opaque black box
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
36
Alternative architecture 3rd party SRU
Running in England
SRU client
Running in USA
Library of Congress Z39.50 server
SRU
Denmark
3rd party service SRU server
Proprietary database
Opaque black box
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
37
What's it like?
  • SRU client software neither knows nor cares that
    the
  • server it is connected to is really a gateway.
  • Application user knows nothing about the Z39.50
    database.
  • You might expect that performance would degrade
    due
  • to the additional step.
  • In practice, with a high-quality gateway,
    performance of
  • the SRU server greatly exceeds that of the
    underlying
  • Z39.50 server.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
38
What's it like?
  • SRU client software neither knows nor cares that
    the
  • server it is connected to is really a gateway.
  • Application user knows nothing about the Z39.50
    database.
  • You might expect that performance would degrade
    due
  • to the additional step.
  • In practice, with a high-quality gateway,
    performance of
  • the SRU server greatly exceeds that of the
    underlying
  • Z39.50 server. (This is done using magic.)

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
39
The Library of Congress's solution
The Library of Congress contracted Index Data
(that's us) to build an SRU-to-Z39.50 gateway for
them. Having built it, we released it under an
Open Source licence, (the GNU General Public
Licence) The LC SRU server is available to
anyone at http//z3950.loc.gov7090/Voyager The
gateway is freely available to download
at http//indexdata.com/yazproxy/
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
40
(Digression why is it called YAZ Proxy?)
  • YAZ is our battle-tested and widely deployed
    Z39.50 toolkit.
  • (It powers 2/3 of all Z39.50 clients and servers
    worldwide.)
  • YAZ Proxy is so called because it acts as a
    Z39.50-to-Z39.50
  • gateway as well as SRU-to-Z39.50 (and
    SRW-to-Z39.50).
  • Why would you want a Z39.50 proxy? For the same
    reasons
  • you want a Web proxy such as Squid
  • Reduce load on the underlying server
  • Improve client performance through caching
  • Protect fragile back-end by sanitising client
    requests
  • Balance load over multiple back-end servers

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
41
What YAZ Proxy does
  • For each SRU Search Request that it receives, YAZ
    Proxy
  • Translates the CQL query into a Z39.50 Type-1
    query
  • Embeds the translated query in a Z39.50 Search
    Request
  • Sends the request to the back-end server
  • (Asynchronously) awaits the Z39.50 Search
    Response
  • Extracts the MARC records from the response
  • Converts them into MARCXML
  • Embeds the converted records in an SRU Search
    Response
  • Returns the response to the client
  • All this is transparent to the SRU client and the
    Z39.50 server.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
42
The sauropod dinosaur Brachiosaurus
(It's been a while since we had a picture.)
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
43
YAZ Proxy in detail performance features
  • Access to the LC catalogue -- whether by Z39.50
    or SRU --
  • is much faster through YAZ Proxy than directly.
  • YAZ Proxy re-uses a pool of initialised back-end
    sessions
  • It can pre-cache a set of ready-to-use back-end
    sessions
  • Query-caching avoids repeated identical searches
  • Record-caching allows repeated requests for the
    same
  • record to be instantaneous
  • The total effect is that access via YAZ Proxy is
    typically 10-100
  • times faster. (Source Larry Dixson of the
    Library of Congress.)

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
44
YAZ Proxy in detail load balancing
YAZ Proxy can be configured to balance load
across multiple back-end Z39.50 servers. Queries
are generally sent to the least heavily loaded
back-end. This allows a heavily-used service to
be scaled across multiple servers, distributed
and made robust against system failure. (Arrangem
ents must be made to keep the multiple copies up
to date and synchronised.)
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
45
YAZ Proxy in detail query translation
Both CQL and the Z39.50 Type-1 query allow
application-specific extensions (e.g. geospatial
searching, thesaurus navigation). Translation
from CQL to Type-1 is therefore driven by a
simple configuration file which maps CQL
index-names, relations, etc. into Z39.50 Type-1
query attributes. index.cql.serverChoice
11016 index.rec.id
112 index.dc.title
14 index.dc.subject 121 relation.lt
21 relation.le
22 relationModifier.relevant 2102
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
46
YAZ Proxy in detail record translation
  • Translating MARC (ISO2709) records into MARCXML
    is a core
  • function of YAZ Proxy.
  • It can also be configured to further transform
    the translated
  • MARCXML records using arbitrary XSLT stylesheets.
  • Standard stylesheets support translation to
  • Dublin Core
  • MODS
  • METS
  • Other formats, such as OAI_DC, are easy to
    support.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
47
But, Mike! This is too good to be true!
Yes.
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
48
But how do you people make a living?
  • Apart from living on good karma, we make money
    from
  • Bespoke development (e.g. building YAZ Proxy)
  • Customisation (e.g. adding support for new XML
    formats)
  • Integration (e.g. making the proxy use local
    authentication)
  • Support contracts (but these are strictly
    optional)
  • Consultancy
  • We also provide services such as hosted
    SRU-to-Z39.50
  • gateways, so YOUR ORGANISATION could support SRU
  • (and SRW) access, and accelerate its Z39.50
    service,
  • without requiring you to install any software.

Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
49
Thanks for listening! You know where to find
us. http//indexdata.com/ Tel. 45 3341 0100 Fax.
45 3341 0101
Delivering MARCXML using SRW/U
Mike Taylor, Index Data ltmike_at_indexdata.comgt
Write a Comment
User Comments (0)
About PowerShow.com