MGT2201 Information Management - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

MGT2201 Information Management

Description:

better suited to online searching. b a c k. n e x t. h o m e. Specificity of index ... Free text searching - computer searches for work or phrase in one or more ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 47
Provided by: busi270
Category:

less

Transcript and Presenter's Notes

Title: MGT2201 Information Management


1
MGT2201 Information Management
  • Module 6 - Classification and Indexing for
    Retrieval

2
Records retrieval
  • Every minute of delay in finding a record is
    costly - in user of requester waiting time and in
    filer searching time - to say nothing of possible
    loss of business as an ultimate result
  • Smith and Kalluas, 1997
  • Effective retrieval requires a knowledge of
    classification and indexing techniques and a
    thorough understanding of the organisations
    activities

3
Classification Systems
  • ..systems (which) reflect the business of the
    organisation from which they derive and are
    normally based on an analysis of the
    organisations business activities. The systems
    can be used to support a variety of records
    management processes
  • AS ISO 15489 Part 1 (Section 9)
  • Classification groups records to limit the
    searching process
  • AS ISO 15489 recommends the use of a business
    classification scheme

4
How business classification assists in the
management of records
  • Providing linkages between individual records
    which accumulate to provide a continuous record
    of activity
  • Ensuring records are named in a consistent manner
    over time
  • Assisting in the retrieval of all records
    relating to a particular function or activity
  • Determining security protection and access
    appropriate for sets of records
  • Allocating user permissions for access to, or
    action on, particular groups of records
  • Distributing responsibility for management of
    particular sets of records
  • Distributing records for action
  • Determining appropriate retention periods and
    disposition actions for records

5
Steps in developing a business classification
scheme
Gather documentary information and conduct
interviews
Understand overall mission/objectives of the
organisation
Derive and list the functions needed to achieve
objectives
Identify hierarchies of activities which support
each function
Identify the transactions which operationalise
each activity
Identify processes/activities common across
functions
Produce a map of the hierarchies for each function
6
REQUIRED ATTRIBUTES OF BUSINESS CLASSIFICATION
SCHEMES
  • Sufficient classes and subclasses (keywords and
    descriptors) for all business functions and
    activities
  • Terminology derived from functions/activities,
    not organisational unit names
  • Unambiguous terminology
  • Discrete classes (keywords)
  • Hierarchical - from most general to most specific
    concept
  • Specific to the organisation
  • Devised in consultation with users
  • Maintained to reflect changing business needs

7
Business v knowledge-base classification
  • Business classification aims to
  • Create a scheme for arrangement and retrieval
  • Provide a basis for determining
  • Which documents to capture for evidential
    purposes
  • Determine retention periods
  • Define and assign security levels
  • Knowledge base classification aims to provide a
    basis for arrangement and retrieval of records
    only
  • Business classification is based on the broad
    core functions and activities of the organisation
  • Knowledge-based classification is based on either
  • literary warrant (document content or subject
    matter) or
  • user warrant (the needs of the user group)
  • Business classification is limited to activities
    which have accountability requirements knowledge
    based systems do not make this distinction

8
Series, files and documents
  • Classification provides a basis for arranging and
    retrieving records
  • Classification takes place at the level of record
    series
  • Series .. a group of identical or related records
    that are normally used and filed as a unit, and
    which permits evaluation as a nit for retention
    scheduling purposes
  • Records series are divided into
  • Subseries (maybe more than one)
  • Files
  • Documents

9
The record series hierarchy
There may be number of sub-series
SERIES
HOUSE SALES
SUB-SERIES
Toowoomba
FILE
Sale to G Bass
DOCUMENT
Contract of Sale
10
Defining series and sub-series
  • Series - a group of identical or related records
    that are normally used and filed as a unit, and
    which permits evaluation as a unit for retention
    scheduling purposes
  • Sub-series - a sub-division of a record series to
    provide additional precision in arranging and
    retrieving records

Activity 6.1
11
File based v document based systems
  • Individual retrieval units may be either files or
    documents
  • A file is a group of related documents located
    within a file cover or folder

12
Establishing new files
  • Creation of a new file needs to be properly
    authorised
  • A new file should only be created when no file
    previously existed on that activity or subject
  • When information in one file also relates to an
    issue dealt with in another file, cross
    references should be included in the indexing
    system

13
Registration of records
  • ..the act of giving a record a unique identifier
    on its entry into a system
  • (AS ISO 15489,Part 1 3.18)
  • The purpose of registration is to provide
  • evidence that a record has been created or
    captured in a records system
  • (AS ISO 15489,Part 2 4.3.3)
  • a record may be registered at the file or
    document level depending on assessment of
    evidence requirements

14
Variation in the registration process
  • Registration in paper based manual systems
  • A register is normally a separate document
  • Registration in computerised (automated) systems
  • A register is likely to be a combination of data
    elements
  • Registration in electronic records systems
  • Register may include classification and
    determination of disposition and access status
  • Can register records automatically without the
    intervention of a records management practitioner
  • Metadata required for registration can often be
    automatically derived from the computing and
    business environment from which the record
    originates

Registration should be unalterable with any
changes able to be tracked through an audit trail
15
Minimum metadata required at registration
  • Unique identifier
  • Date and time of registration
  • Title or abbreviated description
  • Author (person or corporate body), sender or
    recipient

Actual metadata required will depend on evidence
requirements and type of technology used
16
Information which may be included in the records
unique identifier (1)
  • Document name or title
  • Text description or abstract
  • Date of creation
  • Date and time of communication and receipt
  • Incoming, outgoing or internal
  • Author (with his/her affiliation)
  • Sender (with his/her affiliation)
  • Recipient (with his/her affiliation
  • Physical form
  • Classification according to the classification
    scheme
  • Links to related records documenting the same
    sequence of business activity or relating to same
    person or case, if the record is part of a case
    file

17
Information which may be included in the records
unique identifier (2)
  • Business system from which the record was
    captured
  • Application software and version under which the
    record was created or in which it was captured
  • Standard with which the records structure
    complies (eg Standard Generalised Markup Language
    (SGML) Extensible Mark-up Language (XML)
  • Details of embedded document links including the
    applications software and version under which the
    linked record was created
  • Templates required to interpret document
    structure
  • Access
  • Retention period
  • Other structural and contextual information
    useful for management purposes

18
Registration at document or file level
  • Registration may take place at document or file
    level
  • Even within a file-based system, important
    documents may still be registered
  • File based registration process (see next slide)
  • Document based registration process
    (correspondence management system)
  • Documents registered as discrete items and gien
    their own number and/or classification terms
  • Each document also usually labelled with number
    of file in which it is stored

19
Registration at file level
Start
Document arrives to be classified
Does a file exist on activity or subject?
Does file title still reflect activity/subject
accurately?
Modify file title or classification terms
Attach document to file
No
Yes
No
Yes
End
Create new file
20
Indexing
  • ..the process of establishing and applying terms
    or codes to particular records by which they may
    be retrieved.
  • Appropriate allocation of indexing terms allows
    retrieval of records across classifications or
    categories.
  • (AS 4390-1996 (Part 4, 8.1, p10)
  • appropriate allocation of index terms extends
    the possibilities of retrieval of records across
    classifications, categories and media
  • (AS ISO 15489, Part 2, 4.3.4.3)

21
Deriving indexing terms
  • Indexing terms can be derived from the document
    by computer or assigned manually using
    pre-established categories or indexing terms such
    as a thesaurus
  • Indexing terms are commonly derived from
  • The format or nature of the record
  • The title or main heading of the record
  • The subject content of the record, usually in
    accord with the business activity
  • the abstract of a record
  • Dates associated with transactions recorded in
    the record
  • Names or clients or organisations
  • Particular handling or processing requirements
  • Attached documentation not otherwise identified
    or
  • The uses of the record
  • (AS ISO 15489, Part 1 (4.3.4.3)

22
File titling
  • Titles need to be representative of a records
    context as well as its content
  • File title possibly a set of index terms act as
    a label for the file
  • File titles aim to achieve two objectives
  • to help minimise confusion over what file to
    place a document in
  • to aid retrieval

Automated retrieval software uses sequential
numbering, making titles even more important.
Each word in a title is searchable.
23
File title structures (1)
  • The words used in titles / indexes may be
  • selected from a thesaurus or
  • natural language terms (taken from document
    itself not from a list of allowed terms)
  • A number of options exist for file titling
    structures eg
  • OPTION 1 controlled vocabulary in hierarchical
    order followed by a number of natural language
    terms, eg
  • Controlled Vocab Sales - Houses - Toowoomba
  • Natural language terms Sale to B Llyod
  • OPTION 2 natural language summary statement eg
  • Sale of house at 143 Taylor Street to B Llyod

24
File title structures (2)
  • OPTION 3 a natural language summary statement
    followed by set of controlled vocabulary terms in
    no particular order eg
  • Sale of 143 Taylor Street
  • Toowoomba, Sales, Houses
  • OPTION 4 Keyword and descriptors in no particular
    order
  • OPTION 5 Lintons Keyword System of keyword
    followed by up to four descriptors in
    hierarchical order from general to specific, eg
  • keyword Sales
  • descriptors Houses, Toowoomba, Lloyd

25
Keyword AAA Thesaurus
  • Used widely in public sector organisations
  • complies with AS4390-1996 i.e.
  • based on business classification rather than
    knowledge-base classification
  • tight hierarchical structure employing three
    levels of terms i.e.
  • keyword
  • activity descriptor (may be more than one)
  • subject descriptor/free text (may be more than
    one)
  • can be used with electronic or paper records
  • http//www.unimelb.edu.au/CSD/image/execserv/keyin
    tro.htm

26
Advantages and disadvantages of hierarchical file
titling
  • Hierarchical file titling allows
  • Browsing/printing of alphabetical listings with
    file titles grouped together within broad class
    terms (keywords) and activities
  • Broad searching (at level of keyword) or
  • Very specific searching (at level of free text)
  • Possible disadvantages
  • Need to prespecify as many hierarchies as
    possible
  • Tendency to force each title into an
    inappropriate hierarchical order

27
Metadata and Electronic Records
  • Metadata .. A description or profile of a
    document or other information object which may
    contain data about its context, form and content.
  • A vital ingredient of electronic recordkeeping
    because the risk of loss of electronic documents
    is much higher than for paper records
  • addition of metadata can be automated by records
    management software programs
  • http//www.gmb.com.au/products/button/intro.htm
  • Overcomes inconsistency in naming electronic
    documents
  • Essential to include in classification and
    indexing process the location of electronic
    records

28
Steps in the Indexing Process (also involves
classification)
  • Examine the document in an attempt to classify
    and find suitable indexing terms - look for
  • title
  • names of originating persons or organisations
  • opening and closing paragraphs
  • groups of words underlined or printed in
    different typefaces
  • 2 Identify useful retrieval concepts by asking
    questions such as
  • Does the document/file record a transaction?
  • Does the document/file record an activity or
    course of action?
  • Does the document/file refer to methods for
    accomplishing a course of action?
  • Does the document/file deal with a particular
    product, organisation, or condition?
  • Does the subject of the document/file contain an
    action concept ie an operation or process?

29
Appropriate search elements (retrieval keys)
  • Subject terms i.e. Sale of Houses
  • Proper names i.e. G Lloyd
  • Document types i.e. contract of sale
  • Identifying numbers i.e. 2000/10
  • The number of indexing terms employed will be
    determined by -
  • file titling structures
  • user needs
  • available software

30
Steps in the Indexing Process (cont)
  • 1 Examine the document
  • 2 Identify useful retrieval concepts
  • 3 Translate concepts into the indexing
    vocabulary. Issues to be considered include
  • controlled and/or natural language
  • method of indexing proper names
  • pre-coordinate or post-coordinate method
  • how specific index headings will be
  • how to achieve consistency in indexing

31
Controlled v Natural Language vocabulary
  • Controlled vocabulary
  • indexer translates identified concepts into the
    standardised or authorised allowed terms in an
    alphabetical thesaurus
  • Natural language
  • non-thesaurus terms and phrases assigned by the
    indexer in an extra field eg Narrative
  • often include proper names
  • Summaries or abstracts can also be used as index
    terms where terms in any field are searchable
    online

32
Methods of Indexing proper names
  • Personal, organisational and other proper names
    are common indexing terms
  • file titles may consist of a name only or a name
    plus several additional indexing terms
  • consistency in name indexing is difficult
  • names often have various forms
  • composed of elements that can be cited in
    different orders
  • have a tendency to change over years
  • Directory method is most common method of
    indexing proper names

33
Pre-coordinate and post-coordinate indexing(1)
  • Pre-coordinate indexing - terms in a compound
    topic are pre-combined into a single subject
    heading, eg
  • Sales - Houses - 143 James Street
  • necessary to use cross referencing, eg
  • 143 James Street - See Under Sales - Houses -
    143 James Street
  • often used in on-line indexing systems as search
    can be conducted on single words within a heading

34
Pre-coordinate and post-coordinate indexing(2)
  • Post-coordinate indexing - each term in a
    multi-aspect topic is entered as an individual
    indexing unit eg
  • Sales
  • Houses
  • 143 James Street
  • terms may be entered in any order
  • can be searched using Boolean operators
  • better suited to online searching

35
Specificity of index headings
  • Keywords are examples of broad class terms
  • proper names are examples of specific indexing
    terms
  • indexing terms should include some keywords
    (broad terms) plus descriptors (more specific)
    and possible free text (most specific eg proper
    names)

36
Consistency in Indexing
  • Use of thesaurus, naming rules, guidelines on
    translation lead to
  • consistent indexing, predictability in retrieval
  • If no use of thesaurus etc
  • problems with scattered files,
  • poor retrieval rates,
  • incomplete files, problems with efficient
    retention and disposal

?
?
37
Impact of technology on indexing and retrieval
Increasingly sophisticated indexing software
Increasingly sophisticated search engines
Increasingly sophisticated navigational mechanisms
38
New concepts for classification and indexing in
technological environments
Digitally based team collaboration Organisational
intranet could be regarded as a very simple
example of groupware
GROUPWARE
WORKFLOW SOFTWARE
Automates the flow of tasks and information
around an organisation
Digital documents which may be a combination of
text, audio or graphic objects with elements not
necessarily stored together on one server but
brought together through hypertext links
COMPOUND DOCUMENTS
39
Indexing and Search Methods for Full Text
Databases and Networked information
  • The nature and extent of human classification and
    indexing required will depend on the storing,
    indexing, and searching software capabilities of
    the system
  • LANs and intranets allow the requesting of
    information by a client from document collections
    stored on a server
  • Geographically dispersed organisations can access
    corporate documents stored at different points on
    the network

40
RETRIEVAL IN NEW TECHNOLOGICAL ENVIRONMENT
Systematic Directory Structures
Successful retrieval
Standardised Naming Conventions
41
Evaluating retrieval performance
RECALL - the number of documents
retrieved PRECISION - number of documents found
to be relevant
Recall
Precision
Recall
Precision
42
Indexing and Searching Technologies
  • Free text searching - computer searches for work
    or phrase in one or more database fields or
    document full text
  • Free text scanning - computer sequentially scans
    terms in each document or a database to find a
    match
  • N-grams or suffix arrays - index stores word
    fragments on which matching takes place
  • Pattern recognition - index stores binary
    representations - overcomes need for correct
    spelling but reduces precision - great for sound,
    video and images
  • Document clustering - document assigned a theme
    which is used as the index value
  • Hypertext systems - nodes or chunks of
    information (including text, images or sound) are
    stored and connected by means of links or pathways

43
Search Approaches (1)
  • Boolean searches
  • and - Loans and Students
  • or - Loans or Students
  • not - Loans not Students
  • Wildcard searching ( to search for word where
    some letter/s are missing at beginning, in middle
    of or at end of word)
  • McG ( more than one letter) Truncation
    example
  • McGra_at_y (_at_ just one letter)

44
Search approaches (2)
  • Proximity operators - used to stipulate that
    terms must be adjacent, in same sentence, in same
    paragraph etc
  • Student ADJ Loans
  • Fuzzy logic - search specifications made more
    vague that that input by researcher
  • Eg Include Or Search if only AND is requested,
    but rank Or results lower

45
Relevance Ranked Searching
  • Documents found in response to a query ranked
    from most to least relevant
  • 2 approaches
  • term summing - computer counts how many times
    each term in the query occurs in the document
  • weighted term summing - summing based on
    frequency of occurrence and weighted value which
    is dependent on the uniqueness of the term

46
Internet search engines
  • Web search engines eg Google...
  • Use computer programs to move through Web
    addresses, titles/headers and words on web pages
    to collect word and addresses and place them in
    an index and rank relevance of sites to query
Write a Comment
User Comments (0)
About PowerShow.com