Vocabulary - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Vocabulary

Description:

Following this idea thesaurus became THE major tool for controlled vocabulary in ... among them ERIC Thesaurus (we use it for example) ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 43
Provided by: tefkosa8
Category:

less

Transcript and Presenter's Notes

Title: Vocabulary


1
Vocabulary languages in searching
  • Connection
  • indexing
  • searching

2
Basic assertion
  • Indexing and searching inexorably connected
  • you cannot search that that was not first indexed
    in some manner or other
  • indexing of documents or objects is done in order
    to be searchable
  • there are many ways to do indexing
  • to index one needs an indexing language
  • there are many indexing languages
  • even taking every word in a document is an
    indexing language
  • Knowing searching is knowing indexing

3
General definitions
  • Vocabulary Encarta Dictionary
  • 1. words known
  • LANGUAGE - all the words used by or known to a
    particular person or group, or contained in a
    language as a whole
  • Language
  • 1. speech of group
  • the speech of a country, region, or group of
    people, including its diction, syntax, and
    grammar
  • 2. system of communication
  • a system of communication with its own set of
    conventions or special words

4
From general to specific
  • These general definitions are valid for
    application in indexing searching to define
  • index terms
  • indexing vocabulary
  • indexing language
  • search terms
  • search vocabulary
  • query (request, search) language

5
Specific
  • Index term
  • a word or phrase that denotes (describes) a
    concept connotes (implies) a class

index term table describes a
and implies many kinds of tables
for which, if desired, we may have more specific
index terms
6
Specific ...
  • Indexing vocabulary
  • a set of index terms used in a domain or for a
    set of documents or objects
  • it could be even a single document or object e.g.
    a book
  • Indexing language
  • an indexing vocabulary together with rules
    syntax, grammar for their application and use

7
Specific ...
  • Search terms
  • a counterpart to index terms, also denoting a
    concept and connoting a class for a search
  • Search vocabulary
  • a set of search terms in a domain or available in
    a systems
  • Query language
  • a search vocabulary together with rules for their
    use in searching

8
More
  • An index language is the language used to
    describe documents and requests.
  • The elements of the index language are index
    terms, which may be derived from the text of the
    document to be described, or may be arrived at
    independently.
  • The vocabulary of an index language may be
    controlled or uncontrolled.
  • (van Rijsbergen, 1979)

9
Controlled vocabulary
  • Predetermined indicating what terms to be used
    in indexing
  • may show definition of and relations between
    terms
  • examples thesaurus, subject heading list,
    classification
  • Also indicates terms that may be selected for
    searching
  • An indexing AND a searching tool
  • Human constructed
  • and costly to construct and use

10
Uncontrolled vocabulary
  • Derived from documents
  • nowadays automatically
  • using various ways or algorithms
  • constant issue which way is better
  • Used to construct inverted indexes
  • a concordance, such as of the Bible, indicating
    place and position of each word mentioned in the
    text is an inverted index
  • monks used to do it in 12th century, computers do
    it today
  • Inverted indexes are used for free text
    searching

11
Controlled vs. free text searching
  • Endless source of debate controversy
  • But, each has its place for given circumstance
    retrieval goal
  • Each has strengths weaknesses
  • can you list or find a list comparing them?
  • Users mostly use free text searching
  • Professional searchers use both as warranted
  • As option
  • KNOW THY CONTROLLED VOCABULARY

12
Inverted indexes
  • Useful to know how they function to understand
    search retrieval. Steps
  • Each document is indexed
  • every word in a document is taken as index term
    with exception of stop words
  • position in text is noted
  • Indexes for all documents are merged
  • index terms are arranged alphabetically in the
    bowel of the system
  • under each index term are document numbers in
    which it appears position in text for that
    document

13
So, when you search
  • for digital AND libraries
  • computer takes all documents under digital
  • and all documents under libraries
  • compares to see which documents have both terms
    and then
  • provides you the list of those documents in a
    default format or you may choose a format
  • This is also called coordinate indexing
  • coordination is done at time of searching

14
Variation when you search
  • for digital (WITH) libraries or
  • digital libraries i.e as a phrase
  • computer goes through the same steps as before
    but then also
  • looks for documents where digital is positioned
    right before libraries
  • remember computer knows position of each term
    in each document, each sentence
  • So searching for a phrase is a form of searching
    of terms connected with AND but in a given
    sequence

15
Example of inverted file
For simplicity documents have one sentence.Stop
words a, of, in.
Search for slow AND truck gets as results
documents 1 and 3 since both contain slow and
truck
Search for slow (w) truck retrieves only document
3 in which slow is 7th and truck is 8th, they
are right next to each other. Doc 1 has both
words, but not next to each other thus not
retrieved
16
Thesaurus
  • Good old Peter Mark Roget had a most useful idea
    did a great job
  • Following this idea thesaurus became THE major
    tool for controlled vocabulary in information
    retrieval (IR)
  • starting in 1950s to this day many IR thesauri
    have been developed
  • all have a similar structure function
  • but they are difficult costly to construct

17
What is a thesaurus?
  • For writers, it is a tool like Rogets one
    with words grouped and classified to help select
    the best word to convey a specific nuance of
    meaning.
  • For indexers and searchers, it is an information
    storage and retrieval tool a listing of words
    and phrases authorized for use in an indexing
    system, together with relationships, variants and
    synonyms, and aids to navigation through the
    thesaurus.
  • (Milstead, 2000)

18
more
  • A thesaurus to an information scientist is a
    controlled set of the terms used to index
    information in a database, and therefore also to
    search for information in that database so the
    same concepts are represented by the same term.
  • (Batty, 1998)

19
Basic thesaurus components
  • For each entry thesaurus has a classification
    grid
  • Descriptor (DE) an index term that has
  • Scope note (SN) context in which used
  • Broader terms (BT) higher in a hierarchy
  • Narrower terms (NT) lower in a hierarchy
  • Related terms (RT) other connected descriptors
  • Used for (UF) synonyms that are not
    descriptors
  • Note not all of these may be present for every
    descriptor
  • A searcher or indexer can use these as a guide
    for selection/rejection for browsing to get
    ideas

20
Examples of thesauri
  • Thesauri have been constructed for great many
    domains, from A to Z
  • here are some lists
  • international multilingual thesauri
  • online thesauri
  • among them ERIC Thesaurus (we use it for
    example)
  • BUT different thesauri may and do treat the same
    descriptor (index term) differently
  • having different, more or fewer narrower,
    broader, related terms
  • thus it is dangerous to use them interchangeably

21
Standard structure
With variations on the theme, thesauri have
similar conceptual structure to guide searcher or
indexer
Note Every descriptor doesn't have to have all
of these
22
Same thesaurus but
  • Examples of ERIC (Educational Resources
    Information Center) thesaurus as used differently
    in different systems
  • ERIC own system
  • ERIC file on DIALOG (begin 1)
  • ERIC file on OVID (accessible through RUL)
  • Notice how each uses thesaurus displays search
    in its own way, but principles still the same
  • Oh well

23
ERIC online thesaurus on ERIC
  • Allows for
  • searching for words that are included in
    descriptors by category or all categories
  • browsing alphabetically
  • browsing in one of about 40 categories
  • Search for library in all categories found 76
    descriptors that have library included
  • Out of these selected library education

24
ERIC online thesaurus on ERICdescriptor library
education
25
ERIC thesaurus on DIALOG
  • In a convoluted way ERIC thesaurus (and other
    ones) can be displayed on DIALOG (and other
    vendors, such as OVID)
  • How?
  • begin in file 1 ERIC
  • then expand a desired term here we used term
    library
  • you will see under R that certain terms have
    related terms meaning that these are thesaurus
    entries
  • then expand on one of those to see related terms
  • then you can browse choose which ones to use in
    search
  • And here are Print Screens of the process

26
going
Expand library
27
going
28
going
We now chose descriptor LIBRARY ADMINISTRATION
and expand on that one
Neat trick
You can expand on expand get related terms
29
going
These are now R terms of various type
14 related terms for this one are listed
Can expand on this one to see other RT
You can also select any of these to search
30
going
We have now selected r10 library expenditures
31
going
Now we can view some items in a chosen format
or we can further modify this search - add
refine,
32
gone
This is one of the items we got
33
ERIC thesaurus on OVID(accessed through RUL)
For library ask to map as thesaurus term
34
going
35
going
36
going
Retrieved ready to display
37
gone
38
Relevance feedback
  • Method for using information in items judged
    relevant to further refine or change the search
  • e.g. in relevant items we can browse titles,
    descriptors, identifiers, abstracts to get
    leads for further search terms tactics
  • in some advanced systems this may be done
    automatically

39
Query expansion
  • Method for adding, modifying, changing search
    terms in query
  • to broaden, narrow, focus, change terms
  • Many sources can be used
  • relevance feedback, thesauri, dictionaries,
    textbooks, documents, catalogs, people users,
    colleagues, your own mind experience
  • Some systems suggest terms for query expansion

40
Conclusion
  • At the base of all searching are
  • terms
  • vocabularies
  • languages
  • but a variety exists
  • In reality in searching there is no completely
    controlled or uncontrolled vocabulary
  • matter of degree
  • most importantly, matter of mastery

41
symbolicallycontrolled free vocabulary
42
thank you!
Write a Comment
User Comments (0)
About PowerShow.com