Title: Taxonomies and Metadata for Content Management
1Taxonomies and Metadatafor Content Management
Michael HuffInformation Resource OfficerU.S.
Department of State
2E-Government Act of 2002
- The use of computers and the Internet is rapidly
transforming societal interactions and the
relationships among citizens, private businesses,
and the Government. - The Federal Government has had uneven success in
applying advances in information technology to
enhance governmental functions and services,
achieve more efficient performance, increase
access to Government information, and increase
citizen participation in Government. - Most Internet-based services of the Federal
Government are developed and presented
separately, according to the jurisdictional
boundaries of an individual department or agency,
rather than being integrated cooperatively
according to function or topic.
3Which U.S. Government organizations are
experienced in using metadata taxonomy tools?
- Defense Intelligence Agency
- USDA Economic Research Service (ERS)
- Federal Aviation Administration
- FirstGov
- NASA
- Small Business Administration
- Social Security Administration
- Department of State
4(No Transcript)
5Taxonomy
Metadata
6Why use metadata?
- Adding metadata to unstructured content allows it
to be managed like structured content. - Enriching content with structured metadata is
critical for supporting search and personalized
content delivery. - Content that has been adequately tagged with
metadata can be leveraged in usage tracking,
personalization and improved searching.
7Where does metadata fit in the information system
architecture?
User experience. How content is presented and how
users experience and interact with it dictates
its perceived and actual value. Content
architecture Scalable metadata framework to
enable content reuse, and handle changes in
organization goals, user needs, and retrieval
concerns. Tools and technology. The information
supply-chain platform that enables workflows, and
supports organizational and operational concerns.
8(No Transcript)
9What is Dublin Core?
- Dublin Core is the metadata standard for
describing Internet resources so they are easy to
find.
Original workshop held in Dublin, Ohio.
Dublin Core approved as ISO 15836.
Shanghai meeting.
04
95
03
For more information http//www.dublincore.org
10Why is metadata important?
Better navigation discovery
More efficient editorial process
http//dublincore.org/documents/dcmi-terms/
11What is a taxonomy?
The specification of the names of people, places,
things
The specification of the names of people, places,
things and everything else that is needed to
allow search engines and other content
applications to work better.
Animalia
Chordata
Mammalia
Carnivora
Canidae
Canis
C. familiari
Kingdom
Phylum
Class
Order
Family
Genus
Species
Linnaeus
44-Office Equipment and Accessories and Supplies
.12-Office Supplies
.17-Writing Instruments
.05-Mechanical pencils .06-Wooden
pencils .07-Colored pencils
Segment
Family
Class
Commodity
UNSPSC
12Sample Recipe Taxonomy
Main Ingredients
Cooking Methods
Courses
Meal Type
Cuisines
Chocolate Dairy Fruits Grains Meat
Seafood Nuts Olives Pasta Spices
Seasonings Vegetables
Advanced Bake Broil Fry Grill Marinade Microwave
No Cooking Poach Quick Roast Sauté Slow
Cooking Steam Stir-fry
Breakfast Brunch Lunch Supper Dinner Snack
Appetizers Beverages Breads Cheese Cocktails Dess
erts Fish Shellfish Fruit Hors
d'Oeuvres Meat Pasta Salad Sandwiches Soup Vegetab
les
- African
- American
- Asian
- Caribbean
- Continental
- Eclectic/ Fusion/ International
- Jewish
- Latin American
- Mediterranean
- Middle Eastern
- Vegetarian
Controlled Vocabularies
13The power of taxonomy facets
- 4 independent categories of 10 nodes each have
the same discriminatory power as one hierarchy of
10,000 nodes (104) - Easier to maintain
- Can be easier to navigate
147 Common taxonomy facets
Personalized content delivery requires defining
taxonomy facets
and re-use of existing vocabulary sources
15Applying the facets to the Dublin Core metadata
elements
Applied taxonomy metadata facilitates a
multi-faceted view of content
16Facets at work on FirstGov site
http//www.firstgov.gov
17Powered by
- Guided Navigation
- 2-3 clicks to product
- No dead ends
http//www.tesco.com/winestore
18http//www.towerrecords.com
19Powered by
http//www.fortunoff.com
20Seven practical rules for taxonomies
- Incremental, extensible process that identifies
and enables owners, and engages stakeholders. - Quick implementation that provides measurable
results as quickly as possible. - Not monolithichas separately maintainable
facets. - Re-uses existing IP as much as possible.
- A means to an end, and not the end in itself.
- Not perfect, but it does the job it is supposed
to dosuch as improving search and navigation. - Improved over time, and maintained.
21(No Transcript)
22- Creating a taxonomy is only part of the job
- How will it be put to use?
- In a new application, or by modifying an existing
application? - Whats the effort around that?
- Additional Issues
- Tagging Who will add the metadata and how?
23(No Transcript)
24Task 1 Identify objectives
What do you do? What kinds of digital assets are
being produced? For what audiences? What is the
business process for submitting, selecting,
editing, maintaining digital assets? How many
digital assets are there? How fast is this
growing? Are there particular industry or other
standards that are important? What types of
assets are hard to search for (that should be
easier to find)? What tools would be helpful in
locating assets? Acronyms? Abbreviations? Nick
names? Glossary? Thesaurus? Taxonomy? Who else
should we be talking to?
25Task 2 Inventory content
26Task 3 Specify metadata
Legend ? 1 or more - 0 or more
27Task 4 Model content
Header area
Factor asset types from inventory into canonical
types. Select examples from inventory (possibly
with spider). Identify useful chunks for each
asset type. Factor chunks into element superset.
Identify relationships between chunks. Iterate
until agree on asset types, elements, and
relationships.
Main content area
Footer area
Left navigation area
28Task 5 Specify vocabularies
Develop broad taxonomy outline (1-3 levels
deep) Review, revise, and approve taxonomy
outline with stakeholders and subject matter
experts. Fill in taxonomy outline Tag random
samples from content inventory Review, revise,
and approve draft taxonomy with stakeholders and
subject matter experts.
29Task 6 Specify procedures
Develop taxonomy style rules, ensure that the
taxonomy follows them. Develop tagging rules and
procedures, along with software to assist in the
task. Specify taxonomy maintenance process and
the update procedures to follow.
30Task 6 Governance Maintenance
The taxonomy must be changed over
time. Suggestions for changes can come from
users, through query log analysis, and staff,
from feedback form. Governance structure needed
to make sure changes are justified.
Content
Taxonomy
Staff notes missing concepts
Query log analysis
End User
Recommendations by Editor 1 Small taxonomy
changes (labels, synonyms) 2 Large taxonomy
changes (retagging, application changes) 3 New
best bets content
Committee considerations 1 Business Goals 2
Change in user experience 3 Retagging cost
Steering Committee
31Task 6 Steering Committee Roles
Business Lead Keeps committee on track with
larger business objectives Balances cost/benefit
issues to decide appropriate levels of
effort Specialists help in estimating
costs Obtains needed resources if those in
committee cant accomplish a particular
task Technical Specialist Estimates costs of
proposed changes in terms of amount of data to be
retagged, additional storage and processing
burden, software changes, etc. Helps obtain data
from various systems Content Specialist Committee
s liaison to content creators Estimates costs of
proposed changes in terms of editorial process
changes, additional or reduced workload,
etc. Taxonomy Specialist Suggests potential
taxonomy changes based on analysis of query logs,
indexer feedback Makes edits to taxonomy,
installs into system with aid of IT
specialist Content Owner Reality check on process
change suggestions
32Task 7 Train staff
Staff will require training on The UI they use to
tag the content The rules to follow when deciding
what codes to apply The end-effect of the codes
they apply The structure of the taxonomy Tagging
examples come from the content inventory Hardcopie
s of the taxonomy, and yellow highlighters, are
helpful during training
Indexing UI
33What about Automatic Categorization?
- Automatic vs. Manual Categorization is a
cost/benefit tradeoff - Semi-automated recommended over pure manual in
production situations. - Automatic performance not bad, but not equal to
trained manual tagging. - Software is not sane, so errors look crazy.
- Large backlogs of content cant justify
investment of high-quality manual tagging - Old articles rarely accessed.
- Recommend automated bulk tagging with error
reporting and correction process.
34What about automatically-created
taxonomies? Typically a single hierarchy with no
overall plan Results hard for people to
navigate What about automatic categorization? Ac
curacy close to human levels, but errors are very
different Cost/benefit tradeoff Semi-automation
is best practice
35Enterprise taxonomy maintenance workflow
Problem?
Yes
No
Add to enterprise Taxonomy
Suggest new name/category
Review new name
Copy edit new name
Problem?
Taxon-omy
No
Yes
Analyst
Taxonomy Tool
Editor
Copywriter
Sys Admin
36Categorize with a purpose
What is the problem you are trying to
solve? Improve search Browse for content on an
enterprise-wide portal Enable users to syndicate
content Otherwise provide the basis for content
re-use How will you control the cost of creating
and maintaining the metadata) needed to solve
these problems? CMS with a metadata tagging
products Semi-automated classification Taxonomy
editing tools Guided navigation tools
37How do you sell it?
- Dont sell the taxonomy, sell the vision of what
you want to be able to do - Clearly understanding what the problem is and
what the opportunities are - Costs and benefits
- Design the taxonomy in relation to the value at
hand
38Internet Resources
39U.S. Government Resources
40http//www.nasa.gov/home/index.html
41http//pub-lib.jpl.nasa.gov/pub-lib/dscgi/ds.py/Vi
ew/Collection-10
42http//www.loc.gov/flicc/wg/taxonomy.html
43http//www.loc.gov/lexico/servlet/lexico/
44http//www.archives.gov/federal_register/code_of_f
ederal_regulations/thesaurus.html
45http//feapmo.gov/
46http//www.km.gov/
47Other Resources
48http//www.educause.edu/asp/taxonomy/show_taxonomy
_links.asp?TREE1EXPAND1
49http//databases.unesco.org/thesaurus/
50http//www.naa.gov.au/recordkeeping/control/functi
ons_thesaur/contents.html
51http//www.taxonomystrategies.com/html/bibliograph
y.htm
52Summary
- Why taxonomies?Why metadata?
53Shiyali Ramamrita Ranganathan
54Ranganathans Five Laws of Library Science
- Books are for use (They don't belong on the
shelf) - Books are for all every reader his book (Every
reader is unique) - Every book its reader (Every book is unique)
- Save the time of the reader (Make libraries easy
to use) - A library is a growing organism (Libraries are
constantly changing to meet changing patron
needs)
55Thank you
- Michael HuffInformation Resource OfficerU.S.
Department of Statehuffmp_at_state.gov