Title: Dias nummer 1
1Bibliographic relationships
Erik Thorlund Jepsen The Danish Library Agency
2Outline
- Bibliographic relations
- Definitions and types
- Importance of bibliographic relationships
- Searching (navigating), identifying and
selecting - Typologies
- Utilizing bibliographic relationships in OPACs
and other search tools (eg. Integrated search) - By cataloguing
- By automatic means
- Emploing statistical measures
- Conclusion
3Definition of Bibliographic relationships
- A relationship between information entities
- exist, when two entities are somehow
- associated with each other.
- (Velucci, 1997, p.105).
-
- Though semantically precise, this definition does
not provide much guidance for the identification
of relationships, since associations rely on
subjective judgements, assessing the relevance of
connecting/relating two or more entities.
4Bibliographic entities (FRBR)
5Relations and associations
- Who identifies a relation (associates)
- Authors, publishers, indexers/cataloguers, users,
system/rule based (incl. user tracks) - What are the associations based on?
- Information in entities?
- About another entity
- About a relation between two other entities
- Similarity
- Use (and users)
6Examples
- Book is based on ..... (work, expression,
manifestation) - Book is 3. Edition of ....
- In his book, Peter Ingwersen mentions the
findings of Pia Borlund in .... (work,
expression, manifestation) - A novel is part of an anthology
- Book X and Y are written by the same author
- Book X and Y shares two descriptors
- Person A lends book X and CD Y
- Article X and article Y is cited in article
Z - Book X and Y share a lot of words
- Book X and Y share a lot of references
7Importance of relations
- A little bit about FRBR and relations
- Examples
- Identification and selection
- Use of linkages
- Faceted search facilities
- Importance as stated by FRBR and GFOD
8FRBR - tasks
- to find entities that correspond to the user's
stated search criteria (i.e., to locate either a
single entity or a set of entities in a file or
database as the result of a search using an
attribute or relationship of the entity) - to identify an entity (i.e., to confirm that the
entity described corresponds to the entity
sought, or to distinguish between two or more
entities with similar characteristics) - to select an entity that is appropriate to the
user's needs (i.e., to choose an entity that
meets the user's requirements with respect to
content, physical format, etc., or to reject an
entity as being inappropriate to the user's
needs) - to acquire or obtain access to the entity
described (i.e., to acquire an entity through
purchase, loan, etc., or to access an entity
electronically through an online connection to a
remote computer). - (Functional Requirements for Bibliographic
Records, 1998, p.82)
9FRBR additional tasks?
- to relate................. A fifth task?
- Even more, FRBR reminds us of the importance of
bibliographic relationships, and reminds us that
we describe things in the bibliographic universe
in order to meet specific user tasks find,
identify, select, obtain, and i add
relate (Tillett, 2005, p. 198). - Yet, information about relationships supports the
three tasks to find, to identify and to select
(e.g. supports collocation, which is seen as part
of to find) - In other words, to relate is a sub task of to
find, to identify and to select. - It could cause a breakdown of the model to
incorporate to relate as a fifth task - To navigate.A fifth task?
- Yes often among groups of entities
10Importance of relations FRBR-terms
- Information about bibliographic relations between
two or more bibliographic entities can support
the user tasks find, identify and select.
Relations stated somehow in a bibliographic
record can potentially - Improve users understanding of a given entity,
which potentially strengthens the identification
and selection/deselection of the entity. - Improve users options for finding relevant
entities by leading the way from a known (found)
entity too related entities which are more
relevant in a given situation.
11Importance of relations
- Furthermore, information about bibliographic
relationships can strengthen the users
understanding of the system (database) at hand
and the knowledge organization in the system, by - Creating groups of entities
- and
- Facilitating navigation in the bibliographic
universe (the database/Catalogue)
12Example identification and selection
- Draznin, Sandra LeighBørnenes restaurant
- Bogen er baseret på tv-serien Børnenes
restaurant og indeholder enkle madopskrifter som
8-12 årige selv kan lave. Det gælder både
forretter, hovedretter, desserter og snacks. - BOG 1. udgave, 1. oplag. TV 2, 2007
- Opskrifter af Sandra Leigh Draznin, fotos Jes
Buusmann, opskrifter på side 20, 32, 42, 56 og
74 Thomas Castberg Larsen, forord og tips ved
Steffen Bjergved og Thomas Castberg Larsen,
efterskrift af Carina Christensen1. udgave.
2007. 92 sider, illustreret i farverForlag TV
2Form kogebøger opskrifterOpstilling i
folkebiblioteker 64.1Biblioteket anbefaler Fra
10 årISBN-13 978-87-92121-13-4Pris ved
udgivelsen kr. 199,00
13Example linking
14(No Transcript)
15(No Transcript)
16Example Faceted search
17(No Transcript)
18Importance of linking Danish example
- Analysis of searches in bibliotek.dk
- (20.506 searches - 20. December 2004)
- Free-text 7
- Author 34
- Link author 5
- Title 20
- Descriptor/keyword 11
- Link descriptor/keyword more like this
- and Literature about..) 15
- Other (each max 2) sum
8 - Source Kirsten Larsen, Deputy Head, The Danish
Library Centre (DBC)
19Importance GFOD
- User Principle General guidelines for good
practice in display design and criteria for
effective screen displays as these relate to
legibility, clarity, understandability and
navigability - Content and Arrangement Principle 7. Support
navigation from the displayed information to
related information - (this principle is further divided into more
specific, and i add ambitious, principles.)
20Bibliographic relationships identification of,
and typologies
- Associations/relations can be identified by
analyzing - Sets of documents
- Existing information systems
- Standards, rule sets and registration formats
- Empirical studies of users identification - and
assessment of importance of associations among
groups of entities
21Typologies
- Categories Holds between
- Equivalence Relationships (copies, facsimiles,
microforms and other similar reproductions) - Derivative Relationships (versions, editions,
revisions, translations) - Descriptive Relationships (annotated editions,
commentaries, reviews) - Whole-Part Relationships (selections from
anthologies, collections, series, chapters vs.
books) - Accompanying Relationships (supplements,
concordances, indexes) - Sequential Relationships (sequels of a monograph,
parts of a series) - Shared Characteristics Relationships (common
author, publisher, title, subject) - Vellucci and Tillett Categories of Relations
(Shortened description from - Vellucci, 1997)
22Content relationships (equivalence, derivative
and descriptive) are sometimes hard to
distinguish in practice.
23 24(No Transcript)
25(No Transcript)
26Typologi - FRBR
- Relationships depicted in the high level diagrams
- Other Relationships between Group 1 entities at
these levels - Work-to-work
- Expression-to-expression
- Expression-to-work
- Manifestation-to-manifestation
- Manifestation-to-item
- Item-to-item
- Whole/Part at work, expression, manifestation and
item Level - Not meant to be exhaustive!
- Yet, relationships are mapped to user task
(alongside attributes)
27Utilization
- Three purposes when cataloguing information about
relations and setting up system rules - Identification and understanding of relation
- Linking from found entity to related entities
- Displaying meaningful/useful sets of records
28Relations expressed as links
- Relations are expressed as implicit or explicit
links, where explicit links are divided into
directional and mechanical links (hyperlinks) - (Velucci, 1997)
- Hyperlinks are constructed by manual or
computational means - Manual links are static and are commonly used to
structure texts or to connect associatively
related entities (by topic) (and to connect
bibliographic families added by etj) - Computational links can be created at search time
(dynamicality) and are primarily used to connect
similar entities (e.g. based on shared
characteristics added by etj) - (Agosti, 1997)
29Traditional means vs. new ways
- Traditionally relations are handled in very
different ways, caused by - Different types of relationships are handled
differently - Non-specific rules
- Variances between library systems
- Differences in cataloguing policies
- .......
30Traditional means vs. new ways (2)
- Traditional methods include (very generalized)
- Work, expression, manifestation and (sometimes)
item information included in one record. - including notes on predecessors
- often goes for part-whole relations
- Hyperlink structures based on controlled
information about author, subject and (sometimes
title) (most commonly links from author(s),
descriptors and classification numbers)
31Traditional means vs. new ways (3)
- Initiatives to strengthen the utilization of
bibliographic relations could be divided into - Initiatives that try to put the display of
relationships on the agenda (e.g. GFOD) - Initiatives that rely on manual work and further
development of rule sets and registration formats
(e.g. Reuse) - Initiatives that add links to the display of
bibliographic records by controlling data and
implementing local rules for displays. (ad hoc,
local or system-specific initiatives) - Initiatives that try to collocate the different
expressions and manifestations of the work (e.g.
Bibliotek.dk) - Initiatives that try to structure the catalogue
(existing data) according to FRBR (e.g. FRBR
Display Tool) - Faceted search facilities
- The list is not exhaustive!
automa-tic or semi-automa-tic
32Utilization and links examples
- One record structures (e.g. for accompanying
relations) - Computational links for shared characteristics
(The KB example renæssance) - Rules and codes (e.g. for derived relations)
- Computational solutions for work display
33One-record structures
34Rules and codes example Reuse
- Widened use of specific field in Marc-formats to
handle relations in a uniform way. - 787 Non-specific relationship entry
(Repeatable)....and two subfields - w Record control number (target to link current
record to) - g Relationship information (textual optional)
35Reuse (2)
- To distinguish between the various relationships,
and to make them specific, our simple model
proposes the use of indicator 2 in 787, as yet
undefined. This indicator might take on the
following values (and here, a full-scale model
would not have to differ) (in parentheses DC
Simple terms for relations) - 0 Equivalence (facsimile or reproduction)
(IsFormatOf) 1 Simultaneous edition
(IsVersionOf) 2 Successive derivation, edition,
version (IsVersionOf) 3 Amplification (incl.
commentaries, illustrations, criticism etc.)
(IsBasedOn) 4 Extraction (abridgements,
condensations, excerpts) 5 Recordings of
performances 6 Adaptation, modification (change
of genre or medium, arrangement) (IsFormatOf) 9
Translations (IsVersionOf) - a Accompanying relationship (supplements of any
kind) (IsRequiredBy) p Part à whole relationship
(IsPartOf) r Review or other descriptive
relationship s Sequential relationship (like
successive title of a serial) u Unspecific
relationship, based on shared characteristics of
other kinds - (Eversberg, 1998)
36On collocating the work
- Most users seek particular works, not particular
editions. Yet works are published in the form of
editions the fundamental duty of descriptive
cataloguing is to organize the resulting chaotic
bibliographic universe to facilitate user access
to works, and to allow them easily to select the
edition of the work sought that best meets their
needs (Yee, 1997, p.64).
37Computational solutions for work display
- FRBR Display Tool
- Library of Congress FRBR Display Tool was
developed to transform bibliographic data found
in MARC 21 record files into meaningful displays
by grouping them into the work, expression and
manifestation FRBR entities. Based on XML
technologies, the tool may be altered to meet the
needs of individual institutions. It also shows
how the theoretical portion of the FRBR model can
be used practically to allow librarians to
evaluate the consistency of their local
bibliographic data
38Work-display Bibliotek.dk (The Danish Union
Catalogue)
- An example of an almost totally automatic
initiative is the display of editions of a work
in the Danish Union Catalogue Bibliotek.dk - Attributes like author and title are used in a
best match algorithm to identify different
editions of the work. - Due to, a high level of authority control and the
use of original titles, the different expressions
of a work will normally be collocated in the
search result.
39bibliotek.dk - library.dk
- End user version of the Danish Union Catalogue
- Sponsored by The library Agency but maintenance
and development by The Danish Library Centre
(DBC) - Content
- The Danish national bibliography
- all titles in public libraries and research
libraries in Denmark - Content is not 100 equivalent to the Union
Catalogue (availability matters) - Works together with a national transportation
system users can pick up books from every
library at their own (chosen) library
40Adaptation of FRBR in bibliotek.dk
- The records in bibliotek.dk represents
manifestations (AACR2/danMARC2). - The aim is to present these records grouped
according to the work they embody - At one point our definition differs from FRBR
- For practical reasons we consider expressions in
different language to be different works. - You could also say that in this case we prefer
grouping according to the expression of the work. - (Paul B. Jensen, Danish Library Center)
41Implementing the work concept
- The work level display is based on matching and
collocating manifestation records on-the fly - This match is based on simple author and title
data in normalized form - From the work level you can expand to the
manifestations, select one (or more) and make a
request -
(Paul B. Jensen, Danish Library Center)
42Accomplishment
- A more user-friendly interface (as confirmed by a
majority of test-users) - A reduction of unnecessary inter-library loans,
because it is easier to locate an edition to your
local library (or libraries) - (Paul B. Jensen, Danish Library Center)
-
43Challenges(read problems)
- In principle a traditional aacr2/marc-record does
not specify which bibliographic information
refers to work level and which to the
expression/manifestation level - Many bibliographic items contains more than one
work - Collected plays in one volume (e.g. Shakespeare)
- 3 novels in one volume
- 3 symphonies on one cd
- Etc.
-
- (Paul B. Jensen, Danish Library Center)
-
44(No Transcript)
45(No Transcript)
46Neglect or choose edition
47Show editions
Show full record
48Other kinds of linkages
- Author-pointers
- citations, references and links
- semantic equivalence (same as similarity below)
- Use-determined
- frequencies
- Similarity-based
- Co-occurrence of text-elements
- e.g. words in text, citations (bibliographic
coupling) - Third part pointers
- co-citations
- articles, books ..
49Types Author pointers
Author
Entities citing, linking to or referring to other
entities
Entity
Entity
Entities cited by, referred by or linked to by
other entities
50Types Use determined relations
Entity
User
Entities bought or lent by same user
Entity
51Example use determined links
- RomanSuzanne BrøggerLinda Evangelista Olsen /
Suzanne Brøgger4. oplag. - Kbh. Gyldendal,
2002. - 134 siderKatten Linda formodes at være
en reinkarnation af forfatterens mor, der selv
var en kat. Og det passer godt nok på den
tilværelse mor og kat har, og deres måde at
påvirke omgivelserne på ....Tidligere 1.
udgave. 2001.Originaludgave 2001.ISBN
87-00-48736-8 hf. kr. 175,00. - Andre, der har lånt Suzanne Brøgger Linda
Evangelista Olsen, har også låntSuzanne
Brøgger JaSuzanne Brøgger En gris som har
været oppe at slås kan man ikke stegeSuzanne
Brøgger Creme fraicheSuzanne Brøgger
JadekattenSuzanne Brøgger ToneJan Lyderik
Tangs saga. Bind 1-2
52Similar entities
- Statistical based e.g. vector space model using
tfidf weights
Entity 1
Entity 2
Shared elements
53Statistisk baseret lighed mellem dokumenter
- Similaritetsmål mellem dokumenter bruges til
identifikation af relationer. Links etableres på
basis af tærskelværdier. - Den mest udbredte teknik er anvendelsen af tf x
idf vægtning i relation til vektorrumsmodellen. -
- Dette vil typisk indebære flg.
- Betydende features/ord identificeres
(stopordsliste) - Ordene vægtes baseret på tfxidf og evt.
positionelle parametre - Similariteten mellem Dok.1 og alle andre
dokumenter i basen beregnes udfra given algoritme
(f.eks. som cosinus mellem to vektorer
(dokumenter)) - Similaritetsværdier overstigende given
grænseværdi gt etablering af link
54Eksempel
- TermA, optræder i dok1 ift dok2
- tfA 5, men termen optræder en gang som Metadata
og en gang i overskrift gt tfA7 (3X1 2X2)
(Croft anvender tf ift. hyppigste term i dok) - Idf 1/1000 (Croft anvender log. N/n)
- Termvægt for termA mellem dok1 og dok2 7/1000
- Samtlige termvægte indregnes i algoritme. F.eks.
Udregning af cosinus.
55(No Transcript)
56Similarity example
Similar pages
57(No Transcript)
58Third part pointers
- Co-citations
- Similar to others who has lent this book, has
lent these materials - But from an author/domain perspective
- Example from Citeseer -gt
59Citeseer example
- Abstract Latent Semantic Indexing (LSI) is a
technique for representing documents, queries,
and terms as vectors in a multidimensional
real-valued space. The representations are
approximations to the original term space
encoding, and are found using the matrix
technique of Singular Value Decomposition. In
comparison, Multidimensional Scaling (MDS) is a
class of data analysis techniques for
representing data points as points in a
multidimensional real-valued space. The objects
are represented so that... (Update) - Cited by More
- Automated Modeling and Nonlinear Axis Scaling -
Leejay Wu (2005) (Correct) - Similar documents (at the sentence level)
- 8.5 Optimizing Ranking Functions A
Connectionist Approach to.. - Bartell (1994)
(Correct) - Active bibliography (related documents) More
All - 0.2 A Survey of Information Retrieval and
Filtering Methods - Faloutsos, Oard (1996)
(Correct) - 0.2 Document Space Models Using Latent
Semantic Analysis - Gotoh, Renals (1997)
(Correct) - 0.2 Approximating Matrix Multiplication for
Pattern Recognition Tasks - Cohen, Lewis (1997)
(Correct) - Similar documents based on text More All
- 0.5 Chapter 15 Getting Better Results With
Latent Semantic Indexing - Nakov (2000)
(Correct) - 0.4 Image Retrieval using Latent Semantic
Indexing - Pecenovic (1997) (Correct) - 0.4 On the Use of Singular Value Decomposition
for Text Retrieval - Husbands, Simon, Ding (2000)
(Correct) - Related documents from co-citation More All
Co-citation threshold
60Conclusion and perspectives Designing OPACs and
integrated search tools according to relations
- A lot of possibilities lots of types of
relationships to display and utilize in different
ways - Bibliographic families Shared characteristics
Whole-part and other bibliographic relations - Similarity (statistical) Co citations User
defined (co use) - A.o.
- Need for carefull design of system features/link
structures and a lot of testing (not only
emploing user satisfaction but essentially
improved search results) - In other words Pick the functionalities that
works for the user not the ones you like or are
familiar with - In general, we lack large scale user
investigations