Title: OCLC and FRBR: directions and research results
1OCLC and FRBR directions and research results
- Lorcan Dempseywith contributions from Diane
Vizine-Goetz, Ed ONeill, Thom Hickeyand Eric
Revolution or Evolution? The impact of FRBR
(Functional Requirements for Bibliographic
Records) Organized by the Australian Committee
on Cataloging. Melbourne Convention Centre, 2
February 2004
- OCLC research work
- OCLC production plans
- Some issues
- Long standing interest in work-based approaches
- The Humphry Clinker problem
- Strong practical interest
- End-user presentation
- Cataloging help find records
- Collection analysis
- Data enrichment
4OCLC Research and FRBR
- Mining the data
- Ed ONeill
- Algorithmically FRBRizing
- Thom Hickey
- Work-based prototypes
- Diane Vizine-Goetz
- Thom Hickey
5Mining the data
- Analyzing representations of a single work in
detail. - Tested OCLC Research conversion algorithm against
1000 works.
6Types of Works
- Elemental Works have only a single manifestation
(78 ) - Simple Works have only a single expression but
multiple manifestations (16 ) - Complex Works have multiple expression (6 )
7Principal Types of Complex Works
- Translations
- Augmented
- Revised
- Collected/Selected
- All translations are expressions
- Other types of complex works frequently include
9Typical Augmented Work
The Expedition of Humphry Clinker
- 48 Expressions
- 114 Manifestations
- Expressions created by augmentation with notes,
introductions, illustrations, bibliographies,
glossaries, etc.
10Typical Revised Work
- 1st and 2nd Editions are by John Phillip Immroth
- 3rd and 4th editions are by Lois Mai Chan and
Immroths was added to the title
11Collected Works
- A collection of items each of which is a distinct
intellectual or artistic creation a collection
of works - 50 of collected works explicitly list
component works.
- Expressions not clear.
- Bring out the differences that matter.
- Retrospective activity constrained by available
bibliographic data. - Empirical work will support ongoing clarification
of the model (Working group on the expression
13Algorithmically FRBRizing
- The OCLC Research work set algorithm
14Our Approach
- Concentrating on work-level
- Problems with expression-level clusters
- Efficient, maintainable, understandable
- Useful matches with correct cataloging
- Err on the side of missed matches
- Some accommodation of frequent variants (e.g.
Shakespeares Hamlet Hamlet) - Compare with manually clustered
- Reliable at work level. Expression level not
clear enough.
15The Algorithm
- A key is generated for each record
- Extract author, title
- Look up in NACO authority file
- Added entry information as needed
- Form a key from bibliographic record
- Author, title, added entry information
- These can be sorted, compared
- Manual estimate 1.5 manifestations/work in
WorldCat - Algorithm 1.27
- 25,000 clusters have gt20 records
- 415,000 clusters have gt4 records
- 30 records and 50 of holdings are in a cluster
17Work-based prototypes
- A prototype system of 2.6 million bibliographic
records for fiction clustered according to the
OCLC FRBR work set algorithm - Uses the FRBR model to organize, index, and
display bibliographic elements of potential
interest to users
19Fiction Subset
- 2,665,662 WorldCat records (fiction indicator)
- 1,758,479 work clusters
- 1.5 records/cluster
- 3,866 clusters have 20 or more records
- 50,540 clusters have 5 or more records
20Most widely held fiction works
21FictionFinder FRBR
- Information that applies to all expressions of a
given work, such as summaries, genre terms, and
subjects given precedence in work/expression-level
screen displays. - Because of the difficulty of consistently
identifying expressions, manifestations are
organized by language of expression
22(No Transcript)
23(No Transcript)
24Work display
25Work/expression display
26FictionFinder FRBR
- Some characteristics of an expression, such as
expression title, e.g., - Harry Potter and the Philosopher's Stone v.s
- Harry Potter and the Sorcerers Stone
- are presented at the Work/Expression level
- Other less clear-cut distinctions between
expressions manifestations, such as Braille
and electronic book versions are presented at
both the Work/Expression level and the
Manifestation level.
27Work/expression/manifestation display
- An experimental web service
- xISBN server receives a single ISBN and returns a
list of all ISBNs for the work cluster - Designed for machine-to-machine data exchange
- Can return list in XML or XHTML
- Supports automatic expansion of ISBN searches
- Check user ILL requests against all
editions/versions in OPAC - Use xISBN bookmarklet to find local librarys
editions when user finds any edition of item on
Amazon, etc. - Quickly check OPAC for all editions/versions
during selection/acquisitions/gift book processing
Eucalyptus / Murray Bail 1998 Melbourne Text
Pub. ISBN 1875847634
lt?xml version"1.0" encoding"UTF-8" ?gt -
ltidlistgt ltisbngt1875847634lt/isbngt
ltisbngt1860464947lt/isbngt ltisbngt1860464955lt/isbngt
ltisbngt963859313xlt/isbngt ltisbngt2221087615lt/i
sbngt ltisbngt9532060065lt/isbngt
ltisbngt9657120055lt/isbngt lt/idlistgt
Eucalyptus 1998 Melbourne Text Pub. Eucalyptus
1998 London Harvill Press Eucalyptus 1999
London Panther Eukaliptusz 1999 Budapest
Ulpius-ház Hungarian Eucalyptus 1999 Paris
R. Laffont French Eukaliptus 1999 Zagreb
Meandar Croatian Ekaliptus 2001 Tel Aviv
Hargol Hebrew
30Searching for the book on Amazon
31LibraryLookup bookmarklet
Is the book at my library?
32xISBN bookmarklet
Is the book at my library?
33OCLC production plans
- FRBR in FirstSearch (end-user searching)
- End 2004 as part of broader searching
enhancement. - Present users with view most relevent to them
(work, manifestation, ) - FRBR and cataloging
- Interested in potential for FRBRization
services - Use FRBR as aid to finding cataloging copy
- FRBR view of cataloging yet to be discussed.
34Some issues
- Data. Variations in cataloging practice and
errors or omissions in transcription and input
lead to false clusters - Systems. Support in library management and other
systems. - Agreement and shared practice. Theoretical
discussion needs to be informed by practice. The
detail! - Communications format. How to share works etc.
Different internal implementations.
35Further information
Projects Publications ResearchWorks
(soon) Software (algorithm)