Seamless Searching of Numeric and Textual Resources - PowerPoint PPT Presentation

About This Presentation
Title:

Seamless Searching of Numeric and Textual Resources

Description:

School of Information Management and Systems University of California, Berkeley ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 17
Provided by: ait69
Category:

less

Transcript and Presenter's Notes

Title: Seamless Searching of Numeric and Textual Resources


1
Seamless Searching of Numeric and Textual
Resources
Michael Buckland
School of Information Management and
Systems University of California, Berkeley
2
The Significance of Vocabulary
  • An economic claim Vocabulary problems reduce the
    benefits and return on investment in information
    services.
  • Vocabulary is used for indexicality, therefore
    issues of identity are central to LIS.
  • Vocabulary is central to digital libraries.
  • Vocabulary central to explaining the history of
    conceptions of LIS!

3
God --- Knowableness --- History of doctrines
--- Early church, ca. 30-600 --- Congresses.
4
Economic Rationale
  • Massive investment in repositories
  • Large investment in categorization schemes
    classifications, thesauri, concept codes,
    headings,
  • Categorization schemes usually specialized and
    stylized
  • Increasingly unfamiliar to searchers, hence
    ineffective, inefficient use

5
Remedy
Support for searching unfamiliar metadata
vocabularies Interface to translate searchers
vocabulary into systems vocabulary.
6
Examples
Automobile import, export data (Census Bureau)
Automobiles?
No data.
Cars?
Railway or tramway stock
(Passenger motor vehicles, spark ignition engine.)
7
in Library of Congress Classification
TL 205
in U.S. Patent Classification
180/280
in Standard Industrial Classification
3711
8
Example Coastal pollution
F SU COASTAL POLLUTION 0
F TW COASTAL POLLUTION SUMMARIZE SUBJECTS
MeSH Seawater Water pollution Bacteria Water
microbiology Air pollution Environmental
monitoring Bathing beaches
LCSH Marine pollution Coastal zone
management Water --- Pollution Petroleum industry
and trade Beach erosion Coasts Barrier islands
9
International Harmonized Commodity Classification
System Computer
  • HS 84 Nuclear reactors, boilers, machines and
    mechanical appliances
  • HS 8471 Automatic data processing machines and
    units thereof, magnetic or optical readers,
    machines for transcribing data
  • HS 847120 Digital auto data proc mach contng in
    the same housing a CPU and input output device

10
INSPEC Thesaurus subdomain-based indexes
  • Water subdomain Fission reactor safety
    Fission reactor fuel Polymers Organic
    insulating materials Water supply Cable
    insulation Insulation testing and Insulating
    oils.
  • Biology subdomain Water Biomechanics
    Physiological models Neurophysiology Cellular
    effects of radiation.
  • Information Studies subdomain Agriculture
    Natural resources Forecasting theory Operations
    research Erosion.

11
Example Vietnam War.
U.C. MELVYL Online Catalog
FIND XSU VIETNAM WAR Search Results 0 records
FIND XSU VIETNAMESE CONFLICT Search Results
4,190 records
12
Emanuel Goldberg Aerial photography using a
Drachen
Example Tethered balloons. English
Aerostat. German Drachen ( Kite in dictionary)
13
Entry vocabulary search interfaces
  • Software and algorithms map natural language
    vocabulary to specialized metadata terms.
  • Allows users to enter ordinary language queries
    while taking advantage of existing subject
    headings, categorization
  • Uses co-occurrence statistics to link users
    ordinary language terms to system vocabularies
  • Statistical association between lexical items in
    titles and abstracts and the systems metadata
    vocabulary
  • Suggests most likely system vocabulary

14
Thesaurus navigation
  • Facilitates browsing where structure is present
    Broader, narrower, related terms
  • Guides searcher to other parts of the structure

Retrieval set analysis
  • Navigation within micro-domain

15
Web access WWW forms-based application supported
by PerlSupports searches on remote
repositoriesFour subdomain dictionaries in
three databases--- BIOSIS (Biological
abstracts) subdomain water--- INSPEC
subdomains information science, water ---
U.S. Patent Office classification
16
Statement of work
  • Varied prototype Entry Vocabulary Modules.
  • Unintrusive development of EVMs by agents
  • Sensitivity to subdomains.
  • Natural language processing to augment
    statistical term frequency.
  • Recommendations for metadata codebooks for
    numeric databases.
  • www.sims.berkeley.edu/metadata/
Write a Comment
User Comments (0)
About PowerShow.com