Title: Motivating the Semantic Web
1MARCXTM Topic Maps Modeling of MARC
Bibliographic Information
2005.10.07
Hyun-Sil Lee, Yang-Seung Jeon, Sung-Kook
Han Semantic Web Services Research Group Won
Kwang University, Korea
2Agenda
3Overview MARC
- MARC Machine-Readable Cataloging
- standards used for the representation of
bibliographic and related information for books
and other library materials in machine-readable
form and their communication to and from other
computers. - All MARC Standards conform to ISO 27091996
Information and documentation - Format for
Information Exchange. - MARC was originally designed in the late 1960s
to aid in the transfer of bibliographic data onto
magnetic tape, and also to replace the printed
catalog cards with electronic forms. - There are a number of implementation of MARC,
including USMARC used in US, CAN/MARC used in
Canada, and UKMARC used in Britain. - After discussions and minor changes to USMARC and
CAN/MARC, MARC21 was evolved to harmonize both
formats and to cover diverse types of resources
including digital materials and Internet
resources. - MARC accommodates extensive data elements
describing all forms of materials susceptible to
bibliographic description, as well as related
information.
4Family of MARC Formats
- Bibliographic
- a carrier for bibliographic information about
printed and manuscript textual materials,
computer files, maps, music, serials, visual
materials and mixed materials. - Authorities
- a carrier for information concerning the
authorized forms of names, titles,subjects, and
subject subdivisions to be used in constructing
access points in MARC records, the forms of these
names, subjects and subdivisions that should be
used as references to the authorized form, and
the relationships among these forms - Holdings
- a carrier for holdings information for three
types of bibliographic items single-part
multipart serial and may include copy-specific
information information peculiar to the holding
institution information needed for local
processing, maintenance or preservation version
information. - Classification
- a carrier for information about classification
numbers and the captions associated with them
that are formulated according to a specified
authoritative classification scheme - Community Information
- a carrier for descriptions of non-bibliographic
resources that fulfil the information needs of a
community.
5Supporting Documentation of MARC
- MARC 21 Specification for Record Structure,
Character Sets, and Exchange Media - Character sets
- MARC-8 (8-bit encoding)
- UCS/UNICODE UTF-8 (8/16 bit encoding)
- Repertoire of 15,000 characters
- Latin Cyrillic Hebrew Arabic CJK
- Code lists
- Countries, Geographical Languages Sources
Relators
6MARC Record Format
Leader the first 24 characters of the record defining parameters for processing the record data elements that contain coded values and are identified by relative character position the first 24 characters of the record defining parameters for processing the record data elements that contain coded values and are identified by relative character position
Directory directory entries that contain the tag used in variable fields, starting location, and length of each field within the record constructed by computer from the bibliographic record, and can be reconstructed in the same way if any of the cataloging information is altered directory entries that contain the tag used in variable fields, starting location, and length of each field within the record constructed by computer from the bibliographic record, and can be reconstructed in the same way if any of the cataloging information is altered
Variable Field Variable Control Field 00X fields in the MARC 21 formats are variable control fields. either a single data element or a series of fixed-length data elements identified by relative character position
Variable Field Variable Data Field Indicators The first two characters which interpret or supplement the data found in the field. Subfield codes Two characters that precede each data element within a field that requires separate manipulation
7MARC Record Format Example
8MARC Record Format Example
Sign Post
9Formalization of MARC
ltMARC21RecordgtltLeadergtltDirectorygtltVariableFiel
dgt ltDirectorygtltDirectoryElementgt
ltDirectoryElementgtltTaggtltLengthgtltPositiongt
ltVariableFieldgtltControlFieldgtltDataFieldgt
ltControlFieldgtltControlNumbergtltControlFieldEleme
ntgt ltDataFieldgtltTaggtltIndicatorgtltSubFieldgt
ltIndicatorgtltFirstIndicatorgtltSecondIndicatorgt ltS
ubFieldgtltSubFieldCodegtltSubFieldValuegt
10Problems with MARC
- Lack of expandability due to rigorous record
formats, since it was originally intended for the
production of printed catalogue cards in 1960s - Difficulties in representing bibliographic
relationships - Ambiguities in describing MARC records
- Incompatibilities between other MARC formats
since the various library systems have invented
their own non-standard peculiarities in order to
handle local bibliographic materials - Weaknesses in describing bibliographic attributes
of digitized resources
11MARCXML
MARC21 (2709)Records
MARC21 (XML) Records
Tagging Transformations
Character Set Conversion
Dublin Core Records
MODS Records
Other XML Formats
HTML Output
MARC Validation
12MARCXML
- MARCXML a framework for working with MARC data
in a XML environment - Design Considerations and Features
- Simple and Flexible MARC XML Schema for
representing a complete MARC record in XML - Supports all MARC encoded data regardless of
format - Lossless Conversion of MARC to XML
- Roundtrip ability from XML back to MARC
- Data Presentation and Data Conversion
- Extensibility
- A component-oriented, extensible architecture
allowing users to plug and play different
software pieces to build custom solutions
13MARCXML Example
14MODS
- MODS Metadata Objects Description Schema
- XML-based descriptive metadata standard that
includes a subset of data elements derived from
MARC21 - Features
- MODS is intended to complement other metadata
formats. MODS provides a richer bibliographic
element set than Dublin Core. - MODS has a high level of compatibility with MARC
records because it inherits the semantics of the
equivalent data elements in the MARC21
bibliographic format. - In MODS some elements that appear in various
fields in MARC have been repackaged into one. So
MODS can define 19 upper metadata elements. - MODS takes advantage of the XML environment. It
uses language-based tags rather than the numeric
tags traditional to MARC. - MODS also has flexible linking mechanisms by
providing for all the top-level elements with
attributes such as xlink and ID. - MODS accommodates special requirements for
digital resources.
15MODS Example
16Topic Maps Modeling of MARC 21
- Requirements for MARC Modeling
- A model should be able to support the full set of
data elements in MARC21 to achieve seamless
compatibility with MARC formats. - This is a practical requirement in order to
embrace the current circumstances even though it
is awkward. - It should have the same expressive power as
metadata. - This implies that the model should be realized
with semantic descriptors to be used in an XML
environment instead of obsolete alphanumeric
codes. - The use of attributes should be minimized to
maintain consistency and increase readability. - It should be able to maintain the structure of
MARC record format - A model does not intend to develop bibliographic
metadata system based on MARC. - A model can be handled without expertise in MARC
to achieve the usability of the model. - A model should be simple and lightweight for
system implementation and harmonization with
other models.
17UML diagram of MARC Modeling
18MARCXTM Implementation
19XTM Realization of MARC Specification
- DataField ltassociationgt of data item,
indicators, and subfield codes
20XTM Realization of MARC Specification
- Hiding the real data value by topic abstraction
lttopic id"TypeOfPersonalNameEntryElement"gt
ltbaseNamegt ltbaseNameStringgt Type of
personal name entry element lt/baseNameStringgt
lt/baseNamegt ltoccurrencegt ltinstanceOfgt
lttopicRef xlinkhref"Forename"/gt lt/instanceOfgt
ltresourceDatagt 0 lt/resourceDatagt
lt/occurrencegt ltoccurrencegt
ltinstanceOfgt lttopicRef xlinkhref"Surname"/gt
lt/instanceOfgt ltresourceDatagt 1
lt/resourceDatagt lt/occurrencegt
ltoccurrencegt ltinstanceOfgt lttopicRef
xlinkhref"FamilyName"/gt lt/instanceOfgt
ltresourceDatagt 3 lt/resourceDatagt lt/occurrencegt lt/t
opicgt
21MARCXTM for MARC Specification
22XTM Realization of MARC Records
- MARC Records
- Complex to maintain MARC structure due to its
idiosyncratic dependency between indicators and
subfield code - Difficult to realize the seamless compatible with
MARC records - Repeatability of subfield elements are
individually defined in MARC specification. - XTM supports for MARC modeling
- XTM does not provide multiple instances for
ltoccurrencegt. - Difficult to define record schema with
ltassociationgt.
23XTM Realization of MARC Records
24MARCXTM for MARC Records
25Conclusions
- MARCXTM Topic Maps-based implementation of MARC
21 - MARCXTM for MARC Specification
- MARCXTM for MARC Records
- Application of Topic Maps paradigm to
bibliographic information system - Seamless compatible with MARC 21
- expressive power as metadata
- XTM is inappropriate to represent MARC format due
to its idiosyncratic structure and dependency
between data elements. - Metadata models similar to Dubline Core or MODS
are necessary for XTM modeling of MARC. - FRBR (Functional Requirements for Bibliographic
Records) framework is an attractive model for XTM
modeling of bibliographic information system.
26Thank you!!!