Title: Models and Requirements of Metadata Metadata Projects at Tsukuba and Lesson Learned
1Models and Requirements of Metadata Metadata
Projects at Tsukuba and Lesson Learned
- Shigeo SugimotoResearch Center for Knowledge
CommunitiesGraduate School of Library,
Information and Media Studies - University of Tsukuba, Japan
- sugimoto_at_slis.tsukuba.ac.jp
2Self Introduction
- Degrees in computer science, specialized in
software engineering and computer languages - Joined University of Library and Information
Science (ULIS) in 1983 - ULIS merged with Univ. of Tsukuba in 2002
- Member of Board of Trustees and Advisory Board of
Dublin Core Metadata Initiative (DCMI)
3Outline
- Japanese DL Activities and Issues
- DL Development in Japan
- Some Activities at National Diet Library,
National Institute of Informatics, National
Archives of Japan - Information Access Environment for Libraries and
Users - Metadata Centric Projects at Tsukuba and Some
Lessons Learned - Community oriented Metadata Vocabularies
- Interoperability of Metadata and Metadata Schemas
4Some DL Activities and Environment in Japan and
Metadata Projects at Tsukuba
- Shigeo SugimotoResearch Center for Knowledge
CommunitiesGraduate School of Library,
Information and Media Studies - University of Tsukuba, Japan
- sugimoto_at_slis.tsukuba.ac.jp
5Some Topics in Digital Library Development in
Japan
- e-Japan program National program to promote ICT
infrastructure - e-government, education, business, welfare, etc.
- Projects at Libraries and related organizations
- Web Archiving by National Diet Library (NDL)
- National Institute of Informatics (NII)
- National Archives of Japan
6National Diet LibraryLegal Deposit of Electronic
Resources
- Legal deposit of electronic resources
- Discussion by the council on legal deposit of NDL
since late 90s - Tangible resources, e.g. CDs and DVDs
- Covered by the legal deposit law for conventional
materials - Networked resources
- Recommendation in December 2004
- Not-covered by the legal deposit law
- Issues
- Policies to collect electronic resources,
especially Web resources - Technologies and policies to preserve electronic
resources
7National Diet LibraryWeb Archiving and other
activities
- WARP an experimental Web archiving
- Selective, IPR clearance
- High cost
- Digital preservation in the e-Japan program
- NDL will be a key player to collect and preserve
networked resources - Reference Database
- Database of reference service records
- Collaborative development with public libraries
- Digitization program
- Books published in the Meiji era (Later Half of
19th Century)
8National Institute of Informatics
- NII inherits the functions of NACSIS
- National Hub for Japanese University Libraries,
e.g. Union Catalog - Scholarly Database provider
- Cultural Heritage Online
- Collaborative development of a Cultural Portal
- Ministry of Internal Affairs and Communication
- Agency for Cultural Affairs
- Museums
- Hosted at NII
- a related project Digital Silk Roads
- GeNii Global Environment for Networked
Intellectual Information - Scholarly Information Portal
- A related project JuNii
- Gateway to scholarly information from Universities
9National Archives of Japan
- Digital Collection at Japan Center for Asian
Historical Records - Digital collection of government documents in
from 1860 to World War II. - http//www.jacar.go.jp/asia_en/index_en.html
- Launched a new information system in April 2005
- Digitization
- Metadata Issues for future development
- Issues
- Digital Archive
- Preservation of born-digital government resources
10Regional Public Libraries
- Providing Information via WWW
- Library homepages
- OPAC, Digitized contents
- Issues
- Gateway to regional/community resources
- Preservation of digital resources, especially
born-digital resources - Helping communities
- Collaboration with regional schools
- Information resources for small business
- Information resources for young parents
- Medical, welfare information, etc.
11Internet Access Environment in Japan
- Statistics from Ministry of Internal Affairs
(Soumu-sho) and 2003 White Paper Information and
Communication - Number of Subscriptions for Internet services
April 2002 - DSL 2.7 M, FTTH 0.03M, Cell Phone 52.9 M
- April 2004
- DSL 11.5 M, FTTH 1.2 M, Mobile Phone 70 M
- Internet User Population as of December 2002
- From PC 57M, from Cell Phone 28 M, TV-Game
Machine 3.7M (C.P. Only 10M, PC Only 38 M) - Broad Band Connection to Home and Schools
12Information Access Environment for Libraries and
Users
- Mobile Phone as an Internet Access Terminal
- WWW access from mobile phones
- A library service example OPAC
- Retrieving catalogs on a street and between
bookshelves - Content delivery to mobile phones
- Ubiquitous Information Access Environment
13Metadata Centric Projects at Tsukuba
- ULIS-DL
- Collecting Library and LIS Web pages since 1999
- IPL-Asia
- Collect resources useful for public library users
and provide the information of the resources in
CJK languages - Digital Okayama Dai-Hyakka
- a gateway to regional Web resources by Okayama
Prefecture Library (a regional public library) - Metadata Schema Registry and a Model for Metadata
Interoperability - Collaboration with DCMI
- Started in 1998, first meeting at AIT, Thailand
14Building Core Subject Vocabulary for ULIS-DL
- Outline of ULIS-DL
- Subject gateway for resources published by
libraries and LIS institutions. - Metadata records created based on Simple Dublin
Core. - ULIS-DL has a retrieval function but no directory
style interface to browse and navigate the
contents. - Subject terms are given as free terms, i.e. no
controlled vocabulary. - Goal of our research
- To create a subject vocabulary for directory
style interface of ULIS-DL
15Building Core Subject Vocabulary for ULIS-DL
- Status
- 26,000 metadata records (as of 2003)
- 16,000 distinct text strings In the Subject
element of the raw metadata records - Issue How to choose appropriate subject terms
- Result approximately 90 of collected sites is
covered by 1025 subject terms. - No big vocabulary but small vocabulary tailored
to subject domain and community
16(No Transcript)
17IPL-Asia
- IPL-Asia
- Provides information about CJK resources useful
as a public library resources. - Provides resource information in CJK languages.
- Lessons learned
- Domain oriented subject vocabularies need not be
large but need community-specific terms, e.g.,
school activities and regional activities. - User interface for children requires to represent
subject terms in accordance with their ages. - Costs
18Digital Okayama Dai-Hyakka (DODH)
- Regional Portal by the Library of Okayama
Prefecture - Okayama a prefecture in the western part of
Japan - Metadata creation by librarians and
non-professionals, e.g. school teachers,
students, and volunteers. - Small set of subject terms usable for the
non-professionals and designed in accordance with
regional needs. - NDC (Nippon Decimal Classification) is also used.
- Representation of Subjects in accordance with
user age
19Some Lessons Learned
- Is comprehensive/conventional subject vocabulary
really useful? - A Lesson Learned in IPL-A
- Distribution of resource domains
- Subject terms for children and children resources
- Okayama uses two small sets of terms in addition
to NDC - Librarians concern - Maintenance of subject
vocabulary - Use Semantic Web technology.
20Some Lessons Learned
- Multiple Labels for a Single Concept in
accordance with type of audience to improve
accessibility - Encoded in Web Ontology Language (OWL)
21Comparison of Subject Vocabularies- Okayamas
Case -
- Comprehensive and conventional subject
vocabularies are not always useful for
domain-specific resources. - Comparison between subject vocabularies used in
Okayama - Prefecture governments subject vocabulary for
governmental resources (Prefecture Vocabulary) - A subject vocabulary for children (Kids
Vocabulary) - Three vocabularies used in DODH
- NDC, PV, and KV
- Mappings between all pairs of these three
vocabularies
22Distribution of Terms in the NDC term space -
Okayamas Case -
NDC 000Generalities, 100Philosophy,
200History, 300Social Sciences, 400Natural
Sciences, 500Technology, 600Industry, 700The
Arts, 800Language, 900Literature NDC the
number of NDC terms in x00 used in the KV/PV
mapping
23Distribution of Terms in the NDC term space -
Okayamas Case -
NDC 000Generalities, 100Philosophy,
200History, 300Social Sciences, 400Natural
Sciences, 500Technology, 600Industry, 700The
Arts, 800Language, 900Literature NDC the
number of NDC terms in x00 used in the KV/PV
mapping
24Subject Vocabularies
Resources for Government and Social Activities
Educational and Learning Resources
KV
PV
Social Science Technology
Natural Science Arts
Industries
NDC
Comprehensive
General Resources
25A Model of Metadata Schema Interoperability
- Metadata Schema Concepts from Dublin Core
- Metadata Vocabularies (Metadata Element Sets)
- Application Profiles
- Split Semantics and Syntax
- Metadata Schema Registry
- A system/service to store and provide metadata
schemas
26A structure of a metadata instance
27Metadata Schema A
28Metadata Schema B
29Application Profiles
Interoperability between these schamas
30Application Profile
Metadata Vocabulary 2 (Metadata Element Set)
Metadata Vocabulary 1 (Metadata Element Set)
A structural view of application profile
31A Layered Modelsplit semantics and syntax into
layers
Layer 3Concrete Syntax Implementation
Layer 2Abstract Syntax Application Profiles
Layer 1Semantics Metadata Vocabularies
32Metadata and Metadata Schema Interoperability
- Do not reinvent wheels.Do not reinvent
metadata schemas. - Reuse metadata elements
- Use standard encoding scheme to share metadata
- XML, RDF/XML, OWL on WWW
- Share information about metadata schemas
- Metadata schema registry
33Metadata Schema Registry
- To share information about metadata schemas
- Metadata Vocabularies
- Application Profiles
- etc.
- DCMI Registry
- Register and provide DCMI Terms
- Tsukuba Registry
- A collaborating registry with the DCMI registry
- Extension to local vocabularies
34Layered Modeland Metadata Schema Registry
XML Schema for A
XML Schema for B
Layer 3
Application Profile A
Application Profile B
Layer 2
Layer 1
DCMI Registry
Tsukuba Registry
35A Metadata Framework for Context Sensitive
Resource Selection- an on going study -
- Find and access a resource in accordance with
user characteristics and user environment - User with disabilities
- Size of displays
- PC, PDA, mobile phone
- Environment
- In-door, out-door
- etc.
36Summary
- Metadata Centric Projects
- Dublin Core and other element sets
- Subject vocabularies were always the central
issue for our projects. - Requirements for (reasonably) small subject
vocabulary and domain-specific/regional
vocabularies - Technologies to develop/maintain vocabularies
- Semantic Web technologies
- Metadata Schema Registry
- Metadata Interoperability
- Bridge the gap between global and local
requirements
37Summary
- Information environment for libraries and users
are always changing by very rapid progress of
information and communication technologies. - Development of information environment requires
not only basic infrastructure but also software
and know-how to utilize the infrastructure. - Crucial Issues
- Human Resource Development
- Collaboration among Libraries globally and
regionally
38Thank you!Contact sugimoto_at_slis.tsukuba.ac.jp