Lecture 21: Facetted Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 21: Facetted Classification

Description:

Property. Material. Intransitive Verb. Process. Transitive Verb ... Spatial, ... (T-Shirts: Small, Medium, Large, XL, XXL) 2004.11.094 - SLIDE 28 ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 73
Provided by: ValuedGate1
Category:

less

Transcript and Presenter's Notes

Title: Lecture 21: Facetted Classification


1
Lecture 21 Facetted Classification
SIMS 202 Information Organization and Retrieval
  • Prof. Ray Larson Prof. Marc Davis
  • UC Berkeley SIMS
  • Tuesday and Thursday 1030 am - 1200 am
  • Fall 2004

2
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

3
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

4
Controlled Vocabularies
  • Vocabulary control is the attempt to provide a
    standardized and consistent set of terms (such as
    subject headings, names, classifications, etc.)
    with the intent of aiding the searcher in finding
    information
  • That is, it is an attempt to provide a consistent
    set of descriptions for use in (or as) metadata

5
Hierarchical Classification
  • Each category is successively broken down into
    smaller and smaller subdivisions
  • No item occurs in more than one subdivision
  • Each level divided out by a character of
    division (also known as a feature)
  • Example
  • Distinguish Literature based on
  • Language
  • Genre
  • Time Period

Slide author Marti Hearst
6
Hierarchical Classification
Slide author Marti Hearst
7
Labeled Categories for Hierarchical Classification
  • LITERATURE
  • 100 English Literature
  • 110 English Prose
  • English Prose 16th Century
  • English Prose 17th Century
  • English Prose 18th Century
  • ...
  • 111 English Poetry
  • 121 English Poetry 16th Century
  • 122 English Poetry 17th Century
  • ...
  • 112 English Drama
  • 130 English Drama 16th Century
  • 200 French Literature

Slide author Marti Hearst
8
Faceted Categories
  • Mutually exclusive
  • Non-overlapping, distinct categories
  • Relational
  • Relations between facets, subfacets, and foci
    (elements) are not restricted to hierarchical
    generalization-specialization relations
  • Composable
  • Combined using grammars of order and relation to
    form compound descriptions

9
Faceted Classification Along With Labeled
Categories
  • A Language
  • a English
  • b French
  • c Spanish
  • B Genre
  • a Prose
  • b Poetry
  • c Drama
  • C Period
  • a 16th Century
  • b 17th Century
  • c 18th Century
  • d 19th Century
  • Aa English Literature
  • AaBa English Prose
  • AaBaCa English Prose 16th Century
  • AbBbCd French Poetry 19th Century
  • BbCd Drama 19th Century

Slide author Marti Hearst
10
Ranganathan
  • PMEST Facets
  • P(ersonality)
  • WHO Types of things
  • M(atter)
  • WHAT Constituent materials
  • E(nergy)
  • HOW Action or activity terms
  • S(pace)
  • WHERE Where things occur
  • T(ime)
  • WHEN When things occur

11
Classical Facet Analysis
  • Entity
  • Kind
  • Part
  • Property
  • Material
  • Process
  • Operation
  • Patient
  • Product
  • By-Product
  • Agent
  • Space
  • Time

12
Classical Facet Analysis
  • What is being done?
  • Entity
  • Kind
  • Product
  • By-Product
  • What are its parts?
  • Part
  • What are its properties?
  • Property
  • Material
  • How is this achieved?
  • Process
  • By what means?
  • Operation
  • By whom?
  • Agent
  • Patient
  • Where?
  • Space
  • When?
  • Time

13
Classical Facet Analysis
  • Nouns
  • Entity
  • Kind
  • Part
  • Patient
  • Product
  • By-Product
  • Agent
  • Adjectives
  • Property
  • Material
  • Intransitive Verb
  • Process
  • Transitive Verb
  • Operation
  • Adverb
  • Space
  • Time

14
Semantic and Syntactic Relationships
  • Semantic relationships
  • Is-A (thing/kind, genus/species)
  • Mammals
  • Primates
  • Humans
  • Has-Parts
  • Human
  • Head
  • Eyes
  • Syntactic relationships
  • Compounds
  • Wheat harvesting wheat harvesting
  • Object operation operation on object

15
Faceted Classification
  • Clearly distinguishes between semantic
    relationships and syntactic relationships
  • Semantic relationships
  • Within a facet
  • Containment relations
  • Syntactic relationships
  • Across facets
  • Combinatoric relations
  • Have a syntax for syntactic combination of
    semantic terms

16
Power of Facet Combinations
  • The syntactic relations of faceted
    classifications enable a small controlled
    vocabulary to produce
  • Many, many structured descriptions
  • Complex, but formally structured descriptions
    using nested compound descriptions
  • Descriptions for things we do not have words for

17
Example Objects
Red Plastic Glass
Blue Paper Straw
18
Project Team Facetted Classifications
  • 007
  • Personality
  • Straw
  • Glass
  • Operation
  • Drinking
  • Slurping
  • Sipping
  • Material
  • Plastic
  • Paper
  • Color
  • Blue
  • Red
  • ARTery
  • Color
  • Size
  • Material
  • Weight
  • Shape
  • Radius/Circumference
  • Density
  • Volume/Capacity
  • Function/Use
  • Hardness/Softness
  • Yin/Yang

19
Project Team Facetted Classifications
  • Culture Feed
  • Color
  • Red
  • Blue
  • Material
  • Plastic
  • Paper
  • Use
  • Drink from
  • Drink with
  • Dimensions
  • Circumference
  • Height
  • Diameter
  • Picture Portal
  • Color
  • Red
  • Blue
  • Material
  • Paper
  • Plastic
  • Use
  • Containment
  • Transport
  • Shape
  • Torus
  • Planar
  • Holes
  • 0
  • 1

20
Project Team Facetted Classifications
  • F.U.N.
  • Shape
  • Color
  • Material
  • Rigidity
  • Function
  • Container
  • Conduit
  • Locale
  • Weight
  • Size
  • MNM
  • Functionality
  • What it does
  • What you can do with it
  • Physical Properties
  • Color
  • Shape
  • Material

21
Project Team Facetted Classifications
  • pillBox
  • Function
  • Container
  • Conduit
  • Form
  • Shape
  • Cylinder
  • Composition
  • Paper
  • Plastic
  • Color
  • Blue
  • Red
  • Size
  • Tall and skinny
  • Short and fat
  • Team iTour
  • Color
  • Red
  • Blue
  • State
  • Solid
  • Non-porous
  • Flexible
  • Material
  • Plastic
  • Paper
  • Geometry
  • Cylindrical
  • Hollow
  • Function
  • Container
  • Drinking
  • Sucking
  • Blowing

22
Example Objects
Gray Metal Glass
Two Yellow Plastic Straws
23
Example Objects
  • Function
  • Form
  • Shape
  • Material
  • Color
  • Number
  • Function Drinking
  • Form
  • Shape Cylinder
  • Material Plastic
  • Color Red
  • Number 1

24
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

25
Faceted Classification Design
  • Collect examples that need to be classified
  • Identify candidates for facets and subfacets
  • Test classification scheme on examples for facet
    orthogonality
  • Order foci within facets
  • Explicate grammar for ordering and combining
    facets and subfacets
  • Test classification scheme on examples for
    combinatoric power
  • Extend foci for comprehensiveness where
    applicable
  • Create new facets and subfacets where needed
  • Test classification scheme on new examples,
    especially boundary cases
  • Iterate and refine throughout

26
Facet Guidelines
  • Terms on the same level in the ontology should be
    of the same level and type
  • Facets, subfacets, and foci should have a
    discernible order
  • Use of capitalization and singular/plural forms
    should be uniform
  • Sports
  • Team Sports
  • Baseball
  • Football
  • Basketball
  • Solo Sports
  • Marathon Running
  • Sports
  • Team Sports
  • Baseball
  • Football
  • Basketball
  • Solo Sports
  • Marathon Running

27
Ordering Foci (Array)
  • Simple to complex
  • (Locomotions walk, run, jump, skip, hurdle,
    cartwheel)
  • Common/popular to uncommon/unpopular
  • (Vegetarian Pizza Toppings mushroom, onion,
    olive, artichoke, pineapple, pine nuts)
  • Spatial, geographical, or geometric
  • (Southwestern States California, Nevada,
    Arizona, New Mexico )
  • Chronological, historical, or evolutionary
  • (Dinosaur Eras Triassic, Jurassic, Cretaceous)
  • Canonical (pre-established order)
  • (Playground Counting Eenie, Meenie, Mynee, Mo)
  • Alphabetical
  • (Boys Names Al, Bob, Chuck, David, Ed, Frank,
    George, Harry)
  • Size
  • (T-Shirts Small, Medium, Large, XL, XXL)

28
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

29
Why Develop a Thesaurus?
  • To provide a conceptual structure or space for
    a body of information
  • To make it possible to adequately describe the
    topical content of information resources at an
    appropriate level of generality or specificity
  • To provide enhanced search capabilities and to
    improve the effectiveness of searching (i.e., to
    retrieve most of the relevant material without
    too much irrelevant material)

30
Why Develop a Thesaurus?
  • To provide vocabulary (or terminological) control
  • When there are several possible terms designating
    a single concept, the thesaurus should lead the
    indexer or searcher to the appropriate concept,
    regardless of the terms they start with

31
Preliminary Considerations
  • What is used now?
  • Continue using an existing thesaurus?
  • Ad hoc modification of existing thesaurus?
  • Develop a new well-structured thesaurus?
  • What is the scope and complexity of the subject
    field?
  • What kind of retrieval objects or data will be
    dealt with?
  • How exhaustive and specific is the desired
    description of objects?

32
Preliminary Considerations
  • The scope and complexity of the field will
    provide some indication of the scope and
    complexity of the thesaurus
  • It is better to plan for a larger and more
    comprehensive system than a smaller system that
    rapidly will become inadequate as the database
    grows
  • Development of a good thesaurus requires a major
    intellectual effort as well as clerical
    operations like data entry and production of
    sorted lists

33
Development of a Thesaurus
  • Term selection
  • Merging and development of concept classes
  • Definition of broad subject fields and subfields
  • Development of classificatory structure
  • Review, testing, application, revision

34
Flow of Work in Thesaurus Construction
35
1. Term Selection
  • Select sources for the collection of terms
  • Prearranged Sources
  • Open-ended Sources
  • Assign codes to each source
  • Selection of terms
  • For part of pre-arranged and for all open-ended
    sources
  • Enter terms into database with all information

36
1.1 Kinds of Sources
  • Prearranged Sources
  • Existing descriptor lists, classification schemes
    thesauri
  • This includes universal schemes like DDC or LCSH
  • Nomenclatures of single disciplines
  • Treatises on the terminology of a field
  • Encyclopedias, lexica, dictionaries and
    glossaries
  • Tables of contents of textbooks and handbooks
  • Indexes of journals or abstracting journals
  • Indexes of other publications in the field

37
1.1 Kinds of Sources
  • Open-ended sources
  • Lists of search requests or interest profiles
  • Description of projects/activities to be served
    by the information retrieval system
  • Discussion with specialists in the field
  • Sample of documents in the field
  • Ask users why and how these documents relate to
    the field
  • Have documents indexed by experts in the field
  • Lists of titles of documents in the field
  • Abstracts and reviews of documents
  • Your own knowledge

38
Selection of Sources
  • Prearranged sources require less effort in
    gathering the material, and may already indicate
    some relationships between terms and concepts and
    relationships among terms
  • Open-ended sources can reflect current
    terminology and may provide more complete
    coverage
  • Choose a set of sources that are current, as
    complete as possible, and considered authoritative

39
Selection of Sources
  • Each selected source is assigned an ID for
    tracking its use in the development of the
    thesaurus
  • Useful when making decisions about which terms to
    prefer
  • Useful for backtracking when questions arise
    (where did this come from?)

40
Selection of Terms
  • Terms can be transferred directly from
    prearranged sources to the recording medium
    (cards or database)
  • Have to decide which terms and references to
    include, or to take the whole source

41
Selection of Terms
  • In open-ended sources you read through the source
    and pick out terms (i.e. words and phrases) that
    might be useful in retrieval or as references to
    other terms
  • Alternatively, use keyword and phrase extraction
    software to create lists of terms and select from
    those
  • Transfer selected terms to the recording medium
    (cards or database)

42
2. Merging and Development of Concept Classes
  • Sort Term DB into alphabetical order
  • First Round
  • Merge information for identical terms, possibly
    pulling info from additional sources
  • Second Round
  • Merge synonyms or terms in the same concept class

43
3. Definition of Broad Subject Fields and
Subfields
  • Define broad subject fields and sort terms into
    these broad fields
  • Define subfields within each broad field and sort
    terms into these subfields
  • Work out the detailed structure
  • Select preferred terms
  • Merge information for terms in the same concept
    class
  • Repeat these steps
  • For each subfield within a broad field
  • And for each broad field
  • Until all terms have been consolidated and
    preferred terms selected

44
4. Development of Classificatory Structure
  • Produce preliminary version of classified index
    and update the working database
  • Improve classificatory structure
  • Reality check
  • Produce and distribute a version of the
    classified index
  • Distribute to users/experts

45
5. Final Stages
  • Review
  • Testing
  • Application
  • Revision

46
Review
  • Discuss classified index with users/experts
  • Select descriptors and checklist descriptors
  • Assign notational symbols
  • Produce main thesaurus and indexes

47
Review (cont.)
  • Check cross references and insert where needed
  • Produce test version
  • Test by indexing
  • Modify as needed
  • Produce production version

48
Testing a Thesaurus
  • Assign descriptors to a sample set of NEW
    documents (use enough to get an idea of any gaps
    in the thesaurus)
  • Test retrieval using sample questions and seeing
    how effectively the thesaurus maps to the
    appropriate descriptor

49
Art and Architecture Thesaurus
  • http//orange.sims.berkeley.edu/cgi-bin/flamenco/a
    a/Flamenco

50
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

51
Phone Project Assignments
  • Photo Metadata Design (Assignment 6)
  • Having your application and the overall project
    goals in mind, you will design a suitable
    metadata framework to use for annotating photos
    such that all photos would be accessible not only
    for the needs of your particular application, but
    also for the reusability of your photos and
    metadata by other applications.

52
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

53
Discussion Questions
  • Paul Poling on Broughton
  • What are the major inadequacies of 19th century
    classification systems which faceted
    classification overcomes?
  • Some answers
  • They don't "display very much in the way of
    internal logic, or fundamental structural
    principles ineffectual at addressing the
    specific problems of vocabulary they do not
    consider the precise relations between concepts
    multilingual switching difficult, particular in
    group/set names "fail to make adequate
    distinction between permanent hierarchical
    relationships, and relationships of syntactic
    association in complexes.   As a result,
    structures are not logical (since the analysis is
    not rigorous), positioning of compound subjects
    is not predictable (since no operating rules for
    combination are normally present), and retrieval
    is unreliable"

54
Discussion Questions
  • Paul Poling on Broughton
  • The author makes the somewhat startling claim
    that, "the fundamental thirteen categories have
    been found to be sufficient for the analysis of
    vocabulary in almost all areas of knowledge." 
    Are there any exceptions to this that come to
    mind?

55
Discussion Questions
  • Paul Poling on Broughton
  • Broughton later notes that some aspects of
    digital materials cannot be represented by the 13
    categories used for the BC2 system.  For use with
    our cameraphones, what are some categories that
    would need to be included?   More importantly,
    what is the minimum set of additional categories
    needed? 

56
Discussion Questions
  • Paul Poling on Broughton
  • Broughton states that, "There is no obvious way
    in which the core vocabulary can be dealt with by
    machines...the initial allocation of vocabulary
    tocategories must be carried out
    intellectually."  The author goes on to suggest
    that all but the initial category assignments can
    be done by a computer.  How feasible is the BC2
    system for the web, considering this requirement,
    when one considers the fairly rapidly expanding
    categories in so many fields of human
    knowledge?  

57
Discussion Questions
  • Steve Chan on Broughton
  • The category system used in BLISS/BC2 is based on
    a general ? specific ordering and on 13
    functional categories. How do you think that
    Lakoff's ideas of base level categories, and
    the importance of metaphor/embodiment relate to
    the categories chosen in Bliss/BC2?

58
Discussion Questions
  • Steve Chan on Broughton
  • Many of the relationships in the categories fall
    into types such as "is a kind of" or "is a part
    of". These are very similar to the predicates in
    WordNet. As a thought experiment, what would it
    take to interface WordNet into something like
    BC2, so that documents could be parsed for
    content and then automatically categorized? Would
    you want to let such a system generate the
    categories?

59
Discussion Questions
  • Scott Fisher on Faceted Classification
  • What are some different ways of ordering the
    facets within a classification notation?  When
    might one ordering be more appropriate than
    another?  Why might the result be especially
    important for non-electronic documents?

60
Discussion Questions
  • Scott Fisher on Faceted Classification
  • Why is it important that characteristics of
    division be mutually exclusive?  Explain what
    might happen if they are not.

61
Discussion Questions
  • Morgan Ames on Vickery
  • Though facets are a powerful tool for organizing
    information, they can be very time-consuming to
    define.  Vickery describes the creation of
    facets, starting with the analysis of terms used
    by a user group, then the sorting of the terms
    into facets, the development of facets (depending
    on how often they're used), the arrangement of
    the facets, and finally, the establishment of a
    notation for the facets.  Could one automate some
    or all of the process of defining facets for a
    particular area - say, an online community?  If
    so, which parts could be automated, and how?  If
    not, why not - what are the limitations of
    automation?

62
Discussion Questions
  • Morgan Ames on Vickery
  • How do the properties of facets compare with the
    properties of relational databases?

63
Discussion Questions
  • Lilia Manguy on Thesaurus Construction
  • The reading mentions thesauri being constructed
    for institutions. What are some examples of
    institutions with specialized thesauri? Why were
    they deemed necessary?

64
Discussion Questions
  • Lilia Manguy on Thesaurus Construction
  • In our field, what are some scenarios in which a
    thesaurus would need to be constructed? How would
    you determine who would be your expert
    consultants? Who would you choose?

65
Discussion Questions
  • Lilia Manguy on Thesaurus Construction
  • Using the process outlined in the reading for
    constructing a thesaurus, how would you qualify
    whether your thesaurus is good or bad?

66
Discussion Questions
  • Christine Jones on Card Sorting
  • Considering the "vocabulary problem" laid forth
    in "The Vocabulary Problem in Human System
    Communication," by Furnas et. al., do you think
    the card sorting technique is an effective
    approach for categorizing information for the
    SunWeb Intranet, i.e. do you think menus and the
    search function contain vocabulary users will
    understand? Would yourecommend any other tools
    for the user to increase their understanding of
    the SunWeb information space?

67
Discussion Questions
  • Christine Jones on Card Sorting
  • Usability studies including card sorting, icon
    intuitiveness testing, card distribution to
    icons, and thinking aloud walkthrough were
    performed and the results were based in part on
    subjective interpretation. For example, instead
    of depending on formal statistics, eyeballing the
    data was used and when deciding whether to keep
    icons, the user interface designers made the
    final decisions. Do you think this level of
    subjective interpretation was justified for a
    project of this nature? What (if any) changes
    would you make to this approach if the project
    was a redesign or design of Sun's external
    Website?

68
Discussion Questions
  • Carrie Burgener on Flamenco
  • How do the search and browse functions used by
    Flamenco compare to Bates Berry Picking Model?

69
Discussion Questions
  • Carrie Burgener on Flamenco
  • The examples in the article were collections of
    images that had existing metadata associated. It
    has been presented in IS203 that people take
    pictures and generally do not organize them. How
    can the UI design of Flamenco be applied to photo
    annotation?

70
Discussion Questions
  • Carrie Burgener References for Flamenco
  • PhotoCompas tool using Flamenco interface
  • http//shark.stanford.edu4230/cgi-bin/flamenco/mo
    r_full/Flamenco?usernamedefault
  • Presentation by Professor Hearst
  • http//bailando.sims.berkeley.edu/talks/dli02.ppt
  • Different article
  • http//www.sims.berkeley.edu/hearst/papers/cacm02
    .pdf

71
Agenda
  • Facetted Classification
  • Traditional vs. Facetted Classification
  • Designing Facetted Classifications
  • Thesaurus Design
  • Assignment 6
  • Discussion Questions
  • Action Items for Next Time

72
Homework (!)
  • Assignment 6
  • Due Thursday, November 18
  • Read
  • Textbook Organization of Information Chapters
    3-5 (Taylor)
  • Chitra 3
  • Shufei 4
  • Jaime 5
Write a Comment
User Comments (0)
About PowerShow.com