Title: Chapter 9 Designing Metadata
1Chapter 9Designing Metadata
2Metadata Definition
- Data about data, information about information
- Metadata is a structured description of a data
object - Metadata encodes all physical data (contained in
software and other media) and knowledge-containing
information (contained in employees and various
media) from inside and outside an organization,
including information about physical data,
business and technical processes, rules and
constraints of the data, and structures of the
data, used by a corporation - Metadata can be used to describe the datas
behavior, processes, rules, and structure
3Metadata and UCS
- In UCS, metadata can facilitate content search
and retrieval, reuse, and dynamic content
delivery, because you can determine not only what
content is, but who uses it, how it will be used,
how it will be delivered and when - In UCS, metadata enables
- Effective retrieval
- Systematic reuse
- Automatic routing based on workflow status
- Tracking of status
- Reporting
4Benefits of Metadata to UCS
- Reduction of redundant content
- Authors can easily retrieve existing reusable
content - CM can use metadata to identify multiple versions
of same content - Systematic reuse
- Improved workflow
- Metadata for status (ready for review, ready
for publication) - Reduced costs
5Metadata Examples
6Card Catalog and Metadata
- Card catalog identifies what books are in the
library and where they are physically located - Can be searched by subject area, author, or title
(resource discovery) - By showing the author, number of pages,
publication date, and revision history of each
book, card catalog helps you determine which
books will satisfy your needs (resource
evaluation) - Metadata does not need to be digital
7Data Dictionary and Metadata
- Metadata repository
- A centralized repository of information about
data, such as definitions, relationships, origin,
domain, usage, and format - RDBMS schema
8Bibliographic Metadata
- MARC
- A comprehensive, well-developed, carefully
controlled scheme intended to be generated by
professional catalogers for use in libraries - Dublin Core
- An intentionally minimalist standard intended to
be applied to a wide range of DL materials by
people who are not trained in library cataloging - Refer
9Metadata in Traditional Library
- Metadata refers to cataloging or indexing
information that libraries create to arrange,
describe, and enhance access to an information
object - Example
- MARC (MAchine-Readable Cataloging format)
- LCSH (Library of Congress Subject Headings)
- Descriptive metadata describe the properties of
an information object
10MARC
- Developed in the late 1960s at the Library of
Congress - Promote the sharing of catalog entries among
libraries - A comprehensive and detailed standard whose use
is carefully controlled - Information includes author, type of material,
information about the physical material itself,
publishers, some notes, identifiers, - Cataloging is governed by a detailed set of rules
and guideline called AACR2(R) - Internally MARC records are stored as a
collection of tagged fields in a fairly complex
format
11Library Catalog Record
12MARC Fields in the Record of the Previous Slide
13Meaning of Some MARC Fields
14Refer
- Originally designed by computer scientists for
use by mainly scientific and technical
researchers - Used in some bibliographic tool like EndNote
- Format
- Formatted line by line, and records are separated
with a blank line - Each line starts with a key character, introduced
by a percent symbol, that signals the kind of
information the line contains - The rest of line contains the data itself
- No provision for type of bibliographic record
(journal article) in the original Refer
15Bibliography Item in Refer Format
16Basic Keywords Used by Refer
17Types of Metadata
- UCS requires three types of metadata
- Categorization metadata
- Element metadata
18Categorization Metadata
- Information needs to be organized in a logical
structure, categorized for effective retrieval - Use the categories to add metadata to the
information - Metadata is like the old card catalog, presenting
information to users in context, and enable them
to quickly find relevant information - Metadata hierarchies or metadata taxonomies are
used to organize the content
19Metadata Hierarchies
- Represented as tree structures
- A hierarchy provides the content user with an
understanding of how content is organized - Content may be organized under multiple
categories (multiple access points) - Content users use hierarchies to retrieve content
because hierarchies give them multiple paths to
the same information
20Metadata Taxonomies
- Represented as tree structures
- Content may be categorized in only one place
- Taxonomies are used by authors to ensure that
content is categorized in only one way - Categorizing content in multiple ways makes it
difficult to retrieve
21Categorization Metadata Creation
- Industry-specific taxonomies vertical taxonomies
- Industries may also create standards for the
format, structure, and syntax of metadata to
enable different organizations and even different
departments within an organization to share
metadata - Create the categorization metadata yourself
- Include corporate librarians or information
architects - You need to understand your users ask the
following questions - Who is going to retrieve the content?
- What tasks are they trying to accomplish the
content? - What terms will they use when retrieving the
content?
22Create the Categorization Metadata Yourself
- Grouping or clustering related content
- Company benefits
- Benefit policies
- Benefit forms
- Benefit frequently asked questions
- Company benefits (refined and simplified)
- Policies
- Forms
- Frequently asked questions
23Create the Categorization Metadata Yourself
(Cont.)
- Developing your taxonomy
- As you group content, categorize it, and define
the terms to be used to identify your content,
you are automatically creating your taxonomy - Each term in your taxonomy becomes metadata
- Testing your taxonomy to ensure that it is
appropriate and comprehensive - Categorize some sample content and ask users
(audiences) to perform a usability test
24Categorization Metadata Standards
- Dublin Core
- RDF
- XMP
- Crosswalks
25Element Metadata
- Element metadata identifies your content at the
element level, based on the elements defined in
the information model - Authors use element metadata to help them manage
content throughout the authoring process - Three main types of element metadata
- Reuse metadata
- Retrieval metadata
- Tracking metadata
26Metadata for Reuse
- Identifies the components of content that can be
reused in multiple areas - Example content typeoverview, productABC
- Authors can search CM by metadata for reusable
content - CM can automatically search for appropriate
reusable content (based on models and metadata)
and deliver it (systematic reuse) to authors
27Design Metadata for Reuse
- Need to determine the business result you are
trying to achieve and build metadata backward to
achieve the result. Think about the following - Where is content going to be reused?
- Across product? Across information product?
- You need to create metadata to identify each
yes (reuse) - Product ABC, EFG, HIJ
- Information product Brochure, Web, Help, User
guide - Metadata such as information product can be
derived from the template type
28Design Metadata for Reuse (Cont.)
- What type of content is it?
- The element content type for which the content is
valid - Overview, caution, warning, troubleshooting,
example - Metadata such as content type can be derived from
your model or semantic tags - What else do you need to know about the content
to ensure that the correct piece of content is
reused? - Version 1, 2, 2.5
- Region United States, Taiwan, Canada
- Audience consumer, decision maker, technical
support
29Metadata for Retrieval
- Help authors retrieve content
- May include much or all of the metadata used for
reuse - More extensive than metadata for reuse, providing
additional information about an element that
facilitates retrieval - Metadata for retrieval examples
- Title/Subject
- Author
- Date (creation, completion, modification)
- Keywords
- Security level (who can view the content)
30Design Metadata for Retrieval
- Need to determine the business result you are
trying to achieve and build metadata backward to
achieve the result. Think about the following - Who is going to retrieve your content?
- Authors fine granularity
- Users whole documents
- In what form do they want to retrieve the
content? - Authors metadata that defines the source format
and the desired format - Users metadata that defines the appropriate
format for the content
31Design Metadata for Retrieval (Cont.)
- What permissions should users have for retrieving
content? - Each element, container, and information product
needs to have appropriate security permissions
expressed through metadata - How are they going to specifically identify the
desired content? - Analyze the terms your authors and users will
use, then determine what metadata the content
should carry to enable a match between the search
and the content - Consider adding keywords to metadata to
facilitate retrieval
32Metadata for Tracking (status)
- Useful when you are implementing workflow in UCS
- Determine which elements are active
- Control what can be done to an element and who
can do it - Automatically or manually change status metadata
- Example
- Content status indicate status of the content
- Draft, Ready for review, In review, Final, In
approval, Approved, Published - Review status indicate status of the review
content - Accept, Reject
33Design Metadata for Tracking
- Need to determine the business result you are
trying to achieve and build metadata backward to
achieve the result - Design your metadata for tracking after you have
designed your workflow - Identify other metadata that can help you to
track content - Who created the content (author)?
- When was it created/modified (date)?
- Who modified the content (editor)?
- Who reviewed/approved the content
(reviewer/approver)? - How long does it take to create/modify/review
(time)? - Where has it been reused (information product,
product)? - Has it been translated (content status)?
34Creating a Controlled Vocabulary
- Metadata needs to be consistent to facilitate
reuse, retrieval, and tracking ? require a
controlled vocabulary - A controlled vocabulary reconciles all the
various possible words that can be used to
identify content and to differentiate among all
the possible meanings that can be attached to
certain words - Using an unlimited or uncontrolled set of
metadata terms leads to additional work for
authors and reduce the percentage of content that
can be effectively retrieved
35To Create A Controlled Vocabulary
- Identify your metadata categories (Content type,
Product) - Identify the terms that make up that metadata
category - Content status (metadata category)
- Draft, Ready for Review, In review, Final, In
approval, Approved, Published (controlled terms) - If possible, do not provide any metadata that can
be defined by the author. If that is not
possible, monitor the uncontrolled metadata terms
to see whether patterns are emerging that could
then be used to create a controlled vocabulary
36Ensuring Metadata Gets Used
- Whenever possible, automate the application of
metadata (reduce author burden and inconsistency) - Categorization metadata based on content
- Metadata based on the template and model
- Inheritance of metadata based on the parent
- Metadata based on position in the workflow
- If it is necessary for authors to add metadata,
make it possible for them to add the metadata as
they are authoring so that they dont have to
wait until the content is checked into CM
37Metadata Sharing
- Industry-specific standards
- Consider using RDF to design your metadata
- Consider using a crosswalk (a table to map
metadata from one structure to another) to
provide a metadata interchange
38Another View of Types of Metadata
39Types of Metadata
- Structure metadata The ruling monarch of
metadata. It precedes most other kinds of
metadata by creating structural divisions in your
content. - Format metadata Applies to any level of
structure that you define and marks how you
intend to render that structure. - Access metadata Organizes the structures that
you create into hierarchies and other access
structures. - Management metadata The data that you attach to
structures to administer and track it. - Inclusion metadata Stands in for external
content. It marks the place where the external
content is to go.
40Structure Metadata
- Structure metadata says, "You call this stuff . .
." - Before you can say anything more about something,
you must name that something - Basic structure metadata characters, words,
paragraphs - Elements
- Collections of characters, words, or paragraphs
that you intend the reader to take as a unit
(such as a title). - it's the smallest structure that you intend to
access separately in your system - Components
- Collections of elements that you intend the user
to take as a whole (such as a white paper). - Components are the structures that you intend to
manage. - They're the structures to which you apply
management and access metadata.
41Structure Metadata (Cont.)
- Nodes
- Collections of components that, after
publication, you intend the reader to take as a
unit. - On Web sites, nodes are pages. In print
materials, nodes are sections (headings,
chapters, parts, and so on) - Publications
- Collections of nodes that you intend readers to
take as a unit (a single department's intranet
site, for example). - On the Web, you set off publications from each
other mostly by using graphic conventions and the
internal navigation conventions of the site. (A
site may have one or more publications on it). - In print, you most often delineate publications
by using a file boundary. - Publication groups
- Collections of publications that you intend the
reader to take as a unit (the volumes in an
encyclopedia, for example). - set off on both the Web and in print by the
formatting conventions and navigational
structures that you provide for moving between
publications in the group.
42Structure Metadata Example
- ltCOLLECTIONgt  ltPUBgt    ltSECTIONgt    Â
 ltNODEgt        ltHEADERgt...lt/HEADERgt    Â
   ltCOMPONENTSgt          ltCOMPONENTgt  Â
         ltELEMENTgt            Â
 ltPARAgt                ltCOMPONENTgt...lt/CO
MPONENTgt              ltPARAgt      Â
     lt/ELEMENTgt          lt/COMPONENTgtÂ
       lt/COMPONENTSgt      Â
 ltFOOTERgt...lt/FOOTERgt      lt/NODEgt  Â
 lt/SECTIONgt  lt/PUBgtlt/COLLECTIONgt
- ltELEMENTgt  ltPARAgt   ?This is my body text,
and in it I'm embedding an image.    ltMEDIA
ID"m1" URL"dabw.jpg"gt    Â
 ltSIZEgt100,300lt/Sizegt      ltCAPTIONgtThis is a
separate componentlt/CAPTIONgt    lt/MEDIAgtÂ
 lt/PARAgt ?Normal things seem strange if you
really think about them!lt/ELEMENTgt
43Format Metadata
- Format metadata says, "Here's how to render the
stuff that I surround." - Format metadata can apply to any level of
structure in your system. - In many cases, the structural tags themselves are
what you interpret and turn into
platform-specific formatting metadata - ltSECTION LEVEL"1"gtSome Sectionlt/SECTIONgt
?ltH1gtSome Sectionlt/H1gt
44Format Metadata Example
- ltCOLLECTIONgt  ltPUB DISPLAY"child"gt  Â
 ltSECTIONgt      ltNODEgt      Â
 ltHEADERgt...lt/HEADERgt        ltCOMPONENTS
LAYOUT"table"gt          ltCOMPONENTgt  Â
         ltELEMENT TYPEFACE"Arial"gt    Â
         ltPARA STYLE"body"gt          Â
  ?Some ltFORMATTAGgttextlt/FORMATTAGgt      Â
         ltCOMPONENTgt...lt/COMPONENTgt    Â
         ltPARAgt          Â
 lt/ELEMENTgt          lt/COMPONENTgt    Â
   lt/COMPONENTSgt        ltFOOTERgt...lt/FOOTERgt
      lt/NODEgt    lt/SECTIONgtÂ
 lt/PUBgtlt/COLLECTIONgt
Usually in a template and nota content structure
45Access Metadata
- Access metadata says, "Here is how this structure
fits in with the rest." - you most often use it to gain access to the
content. - You can store access metadata within a component
or outside it in a separate place. - The types of access metadata correspond to the
types of access structures hierarchy, index,
associations, and sequences.
46Access Metadata Example
- ltCOLLECTIONgt  ltPUBgt    ltSECTIONgt    Â
 ltNODE KEYWORDS"rollup" gt      Â
 ltHEADERgt...lt/HEADERgt        ltCOMPONENTSgtÂ
         ltCOMPONENT INDEX"term1,term2,term3"
gt            ltELEMENTgt            Â
 ltPARAgt              ltCOMPONENTgt...lt/COMPO
NENTgt                ltLINK TARGET"C123"
gtFor more info, seelt/LINKgt            Â
 ltPARAgt            lt/ELEMENTgt      Â
   lt/COMPONENTgt        lt/COMPONENTSgt  Â
     ltFOOTERgt...lt/FOOTERgt      lt/NODEgt  Â
 lt/SECTIONgt  lt/PUBgtlt/COLLECTIONgt
For more information, ltA HREFgtlt/Agt For more
information, see Links in Chap. 5
47Access Metadata (Cont.)
- Access metadata is as often outside the content
structure as it is inside. - Instead of typing the terms into the component,
you want to type the component into the terms
ltINDEXgt  ltTERMgt    ltNAMEgtNOAAlt/NAMEgt  Â
 ltCOMPONENTSgtC123,C456,C789lt/COMPONENTSgtÂ
 lt/TERMgtlt/INDEXgt
48Management Metadata
- Management metadata is there to help you keep
track of and administrate content. - ID, Title, Author, Create data, Modify date,
Status, Size, Owner, Publish date, Expire date - Management metadata isn't always only for
management. - Any of the types listed here you can just as
easily consider as content to publish as well as
data to help manage the content to publish. - Whether or not you show the values of these
metadata elements to your audience, their use to
you is the same, to help you keep track of and
administrate your content.
49Management Metadata Example
- ltCOLLECTIONgt ltPUB ID"p1"gt    ltSECTION
ID"s1"gt      ltNODE ID"n1"gt      Â
 ltHEADERgt...lt/HEADERgt        ltCOMPONENTSgtÂ
         ltCOMPONENT ID"C123"gt        Â
   ltTITLEgtlt/TITLEgt            ltADMINgtÂ
             ltOWNERgtO234lt/OWNERgt      Â
       ltCREATEgt9/23/01lt/CREATEgt        Â
     ltMODIFYgt9/30/01lt/MODIFYgt          Â
   ltSTATUSgtStatus1lt/STATUSgt          Â
 lt/ADMINgt            ltELEMENT
NAME"intro"gt              ltPARA
ID"p1"gt...lt/PARAgt            lt/ELEMENTgtÂ
         lt/COMPONENTgt      Â
 lt/COMPONENTSgt        ltFOOTER...lt/FOOTERgtÂ
     lt/NODEgt    lt/SECTIONgt  lt/PUBgt
lt/COLLECTIONgt
50Inclusion Metadata
- Inclusion metadata says, "Put the following
external entity here." - It enables you to reference content that isn't
physically in the content structure.
ltELEMENTgt  ltPARAgt   ?This is my body text,
and in it I'm embedding an image.    ltMEDIA
ID"m1" URL"dabw.jpg"gt    Â
 ltSIZEgt100,300lt/SIZEgt      ltCAPTIONgtThis is a
separate componentlt/CAPTIONgt    lt/MEDIAgtÂ
 lt/PARAgt ?Normal things seem strange if you
really think about them!lt/ELEMENTgt
51Inclusion Metadata Example
ltELEMENTgt  ltPARAgt   ?This is my body text,
and in it I'm embedding an image.    ltMEDIA
ID"m1" URL"dabw.jpg"gt    Â
 ltSIZEgt100,300lt/SIZEgt      ltCAPTIONgtThis is a
separate componentlt/CAPTIONgt    lt/MEDIAgtÂ
 lt/PARAgt ?Normal things seem strange if you
really think about them!lt/ELEMENTgt
- Shortcomings
- This is HTML
- The image and its caption are locked in this
location - The reference may break
- You have no place to put other info that you may
need
52Inclusion Metadata (Cont.)
- If you really intend to make the ltMEDIAgt element
a separate component, you're better off not
directly embedding it in another component by
pointing to its URL but instead referencing it
there based on its ID - The m1 component is stored with the other "m"
components where you can more easily find,
manage, and include it in other places in the
content structure
ltELEMENTgt   ltPARAgt    ?This is my body text,
and in it I'm embedding an image. Â Â Â Â ltINCLUDE
REFID"m1"gt    ?Normal things seem strange if
you really think about them!  lt/PARAgt
lt/ELEMENTgt