Title: Lorcan Dempsey (with contributions from colleagues)
1OCLC some development and research directionsin
the areas of metadata management and knowledge
organization.Presented to Library of Congress
cataloging managers retreat.
- Lorcan Dempsey(with contributions from
colleagues) - VP Research and Chief Strategist
- Library of Congress, 15 June 2004
2 Topics
Metadata management and knowledge organization
Framework for WorldCat directions
Open WorldCat
Working with web services
Some research, some production
Making data work harder
3Framework for WorldCat directions
4Collections grid
stewardship
high
low
low
uniqueness
high
5WorldCat the what?
- WorldCat - Grow - Version - Improve
- Easier to use (FRBR)
- Microcontent
- Evaluative content
The Open Web Both surface and acquire WorldCat
content
Add special collections institutional content
to WorldCat dissertations, cultural heritage
collections, Eprints, learning objects
6WorldCat the how?
7Some issues
- Metadata variety
- Encoding, element sets, values/content
- Provenance
- Metadata manipulation
- Validation, identification
- Enhancement, augmentation
- Relation, FRBR, deduplication
- Transformation
- Schematization and web services
- Make data available in forms that allow machine
services to be flexibly built on top of them - Everything is a service
8Open WorldCat
9Open WorldCat
- Facilitate the rendezvous of users and library
services on the web - Surface the library where the users are
- Help release the value of library services in the
working and learning lives of their users.
10Open WorldCat Architecture
WorldCat , Additional collections can be added to
Worldcatlibraries domain
Metadata
Schemas and Vocabularies
OCLC Developed Geo-locator services to matches
users to extensive FirstSearch WorldCat
institution and user profiles
Profiles and Relationships
OCLC Uses Host of Authentication and
Authorization tools to progressively match
content to rights
Content Owner
Access
OCLC Organizes WorldCat content in model
suitable for harvesting, anticipate unique
aspects of various portals
Aggregators
Distribution, Search, Display
Google, Yahoo and Book Vendors
Portals
Organization and Presentation
11Current partners
Click in presentation mode to go through
toexamples
- Book vendors and bibliographies
- ABE Books
- ABAA
- Alibris
- HCBIB
- BookPage
- Search engines (pilot with 2M records exposed as
web pages for harvesting) - Google
- Yahoo!
Try a search forA history of caricature and
grotesque in literature and art
12Google and Yahoo! timeline
13Traffic
Full record displays. Projected for June.
14Metadata management and knowledge organization
15Research activities
- Structures
- FRBR
- VIAF
- BT
- FAST
- Vocabulary encoding and mappings
- Services
- xISBN
- Metadata transformation services
- Terminology services
- Authority services
- Automatic classification and cataloging
- Eprints uk
- Web harvesting
16FRBR
Click in presentation mode to go through
toFictionFinder
- OR Work-set algorithm
- Work-based view incorporated into WorldCat in
FirstSearch in late 2004 - FictionFinder
- 2.6 million fiction records from Worldcat,
clustered by OCLCs FRBR algorithm - Make greater use of data (genres, settings,
imaginary characters, etc) - Participate in ongoing FRBR refinement
17FAST
18Vocabulary mappings
19Services
- Web services
- Computer to computer applications over the web
- Unplug and play
- Unbundling monolithic applications and making
functionality available in more modular ways - Reuse and sharing
- Of services!
- Release the value in a web environment of the
historical library investment in vocabularies and
structures
20xISBN
- An experimental web service
- Leverages FRBRization work
- Give it an ISBN, it returns all related ISBNs
- Based on WorldCat
- Designed for machine-to-machine data exchange
- Examples
- Check user ILL requests against all
editions/versions in OPAC - Find librarys editions when user finds any
edition/version of item on Amazon - Check OPAC for all editions during
selection/acquisitions/gift book processing
21xISBN
Install FRBR Bookmarklets in your browser to see
xISBN working.See Bookmarklets page At
www.oclc.org/research/researchworks/
Click cover to search Seattle Public Library
Click cover to search amazon.co.uk
22Metadata schema transformations
- Metadata Schema Transformation Services
- Evaluate approaches to crosswalking metadata
- Prototype transformation environments
- The XSLT short path
- Supports lightweight XML processing
- Designed for public access
- Deliverables
- OAI repository of METS-captured xwalks NEW
- The long path option
- Designed for high-fidelity translations
- May be public or proprietary
- Deliverables Toolkit expertise in non-MARC
formats
23(No Transcript)
24A crosswalk as a METS record
- Describe the crosswalk object in the METS header.
- Assemble and identify six objects in the METS
structural map - The source metadata schema
- The target metadata schema
- The crosswalk
- Human-readable and executable versions of each
- Associate metadata for each file in the METS
Descriptive Metadata Section.
25Crosswalk METS record in OAI repository
26(No Transcript)
27(No Transcript)
28What the METS encoding solves
- The semantic and syntactic information required
for interpreting and executing a crosswalk is
collected into a single object. - The repository is searchable by humans and
automated processes. - Services can be built on top of it.
- It encourages the development and standardization
of crosswalks.
These outcomes are possible because every
component in the system is a standard.
29Terminology Services
- Terminology services are web services for
knowledge organization schemes (kos) - e.g., authority files, subject heading systems,
thesauri, taxonomies, and classification schemes - A web service that provides mappings from a term
in one vocabulary to one or more terms in another
vocabulary is an example of a terminology service
30Current Situation
- A plethora of vocabularies
- Many encoding formats
- Few inter-vocabulary connections
- Identifiers inadequate
- Unavailable
- Temporary
- Inconsistent
31Terminology services system framework
- Schema transformation
- MARC XML
- SKOS
- Zthes
- Record enhancement
- Inter-vocabulary mappings
- Persistent identifiers (infouri)
- Access
- Human-readable
- Browse interface (ERRoLs)
- Search/retrieve records (SRU/W)
- Switch between schema-specific views (XSLT)
- m2m
- Publishing (OAI)
- Search/retrieve records (SRU/W)
- infouri resolution (OpenURL)
- Open standards
- MARC 21
- XML/XSLT/XPath
- SKOS
- Zthes
- SRU/SRW
- OAI
- infouri
- OpenURL
- Open source software
- OCLC OAICat
- OCLC SRU/SRW server
- OCLC ERRoL J2EE webapp
- Open content
- GSAFD, others
- Open access
- Web services-oriented
32Schema Transformation
- MARC XML
- Authority Format Classification Format
- SKOS
- Simple Knowledge Organization Systems
- Zthes
- Z39.50 Profile for Thesaurus Navigation.5
- Based on Z39.19 (NISO Thesaurus Standard)
33Vocabulary Processing
schema transformation
data enhancement
- Add
- provenance (MARC Org. Codes)
- persistent identifiers (infokos)
- Conversion from most
- formats
- Z39.19
- wordlists in PDF, etc.
- Optionally, add
- inter-vocabulary mappings
- Concepts terms
- persistent identifers
- (infokos)
- Initial conversion to
- MARC XML
- Authorities format, or,
- Classification format
34Infokos
- Infouri
- provides a mechanism for the registration of
public namespaces that are used for the
identification of information assets - The kos identifier
- provides a mechanism for identifying knowledge
organization schemes and the concepts used in
those schemes. It has two elements - scheme
- concept
35New services environment
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Name authority lookup
Lorcan Dempsey
- Interactive
- As a web service
- An example authority control serviceinvoked
from within Dspace ?
Click in presentation mode.
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Working with web services
47Making data work harder
48Data mining
- Research
- Production
- Collection analysis service in development phase
- Leverages WorldCat data in interactive mode
- Compare my collection to my peers
- Compare my collection to my neighbors
- Profile my collection by subject, by age,
- etc
49Collection
- Change creates demand for better data.
- Growing interest in knowing more about
- Characteristics
- Gaps and overlaps
- Use
- Tuning collections based on data.
- Focus collection spending where creates most
value.
50Some projects
- Characteristics of collections
- WorldCat
- CIC
- Compare ILL, circulation and holdings data.
- Last copy what is irreplaceable?
- ARL Global Resources.
- Exploring coverage of overseas titles in ARL
libraries.
- Depends on consistency, coverage, currency
51Comparing CIC Collection Profiles
52Audience level
Forge
Letters
53Profiles of Letters Forge Example
54 Topics
Metadata management and knowledge organization
Framework for WorldCat directions
Open WorldCat
Working with web services
Some research, some production
Making data work harder
55Thoughts
- Machines will do more work
- Consistency becomes more important
- Variety
- Low precision
- Make data work
56The pattern is new
The knowledge imposes a pattern and falsifies For
the pattern is new in every moment
57Further information
Thanks to colleagues in OCLC Research
for contributions to this presentation. Further
information about OCLC Research projectscan be
found at http//www.oclc.org/research/
Thanks to colleagues in OCLC Collection
Management Services for contributions to this
presentation. Further information aboutOpen
WorldCat athttp//www.oclc.org/worldcat/pilot/