Per Ahlgren - PowerPoint PPT Presentation

About This Presentation
Title:

Per Ahlgren

Description:

Important to provide users of the digital library with tools that help them to ... to document frequency (df): df (A/TI) df (B/TI) df (A/DE,ID) df (B/DE,ID) ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 21
Provided by: bhs593
Category:
Tags: ahlgren | df | per

less

Transcript and Presenter's Notes

Title: Per Ahlgren


1
On a cognitive search strategy
  • Per Ahlgren

2
Overview
  • Background
  • Ingwersens example
  • A cognitive search strategy
  • Search formulation construction
  • Four situations but two search formulations
  • A remedy
  • Concluding remarks

3
Background
  • Important to provide users of the digital library
    with tools that help them to retrieve information
    relevant to their needs
  • Search stratgies - approaches for search problems
  • Example. The building blocks strategy

4
Ingwersens example
  • Two terms, A and B
  • Two fields, title field (TI) and
    descriptor/identifier field (DE,ID)
  • Four atomic search formulations
  • A/TI A/DE,ID B/TI B/DE,ID
  • Assumed situation with regards to document
    frequency (df) df (A/TI) lt df (B/TI) lt df
    (A/DE,ID) lt df (B/DE,ID).

5
  • Principle (P) - atomic search formulations with
    lower frequencies should be combined before
    formulations with higher frequencies
  • Idea behind P a terms value for retrieval
    purposes is inversely proportional to the number
    of documents in which the term occurs

6
A cognitive search strategy
  • Different cognitive agents - author (TI) and
    indexer (DE,ID) - are involved with regards to
    the assignment of terms to documents
  • Occurrence of different cognitive agents
  • Cognitive overlap
  • When constructing Boolean search formulations in
    a two term TI/DE,ID search, consider the factors
    occurrence of different cognitive agents and
    cognitive overlap when combining atomic
    formulations with the AND operator

7
  • Optimal situation both terms are present in both
    fields (the cognitive agents involved agree about
    the two access points). Expressed by the
    following formulation A/TIB/TIA/DE,IDB/DE,ID.
  • A multiple evidence approach - the strategy
    combines evidence for the relevance of a document

8
Search formulation construction
  • Purpose stepwise retrieval of a number of
    subsets of the set D of documents that is
    retrieved by AB. First formulation S1
    A/TIB/TIA/DE,IDB/DE,ID.
  • Two methods
  • (1) NOTPRESET (Ingwersens method)
  • A new formulation is obtained by (1) combining
    atomic formulations by the AND operator,
    considering the factors (a) presence of A and B,
    (b) occurrence of different cognitive agents and
    (c) document frequency, (2) excluding all the
    preceeding formulations by the NOT and OR
    operators, and (3) ANDing the results of (1) and
    (2).

9
  • Should be fairly easy for the user to grasp
  • Example. S2 A/TIB/TIA/DE,ID NOT S1
  • (2) NOTATOMIC
  • A new formulation is obtained by (1) combining as
    many atomic formulations as possible (in the
    light of earlier formulations) by the AND
    operator, considering the factors (a) presence of
    A and B, (b) occurrence of different cognitive
    agents and (c) document frequency, (2) excluding
    by the NOT and OR operators all the atomic
    formulations that are not part of the result of
    (1), and (3) ANDing the results of (1) and (2).
  • Yields, in most cases, shorter fomulations than
    NOTPRESET

10
  • Should be fairly easy for the user to grasp
  • Is abandoned in step 10 and step 11
  • Example. (2) A/TIB/TIA/DE,ID NOT B/DE,ID
  • Presence of A and B the most importent factor
  • Occurrence of different cognitive agents more
    important than document frequency

11
Four situations but two search formulations
  • Consider the (NOTATOMIC) formulations
  • (10) A/TI NOT (B/TIB/DE,ID) and
  • (11) B/TI NOT (A/TIA/DE,ID).
  • (10) and (11) are indefinite with respect to
    A/DE,ID and B/DE,ID, respectively.

12
  • (1) A is present in TI but not in DE,ID, and B is
    absent from both fields
  • and
  • (2) A is present in both fields, and B is absent
    from both fields,
  • or between
  • (3) B is present in TI but not in DE,ID, and A is
    absent from both fields
  • and
  • (4) B is present in both fields, and A is absent
    from both fields.

13
  • We then need four formulations instead of (10)
    A/TI NOT (B/TIB/DE,ID) and (11) B/TI NOT
    (A/TIA/DE,ID) (instead of S10 and S11), four
    formulations that express the four situations.
  • Ingwersens formulations express only 11 of the
    16 possible situations with regards to the
    presence of A and B in the two fields.

14
Figure 1 The 16 possible situations with regards
to the presence of of A and B in the two fields.
15
A remedy
  • NOTATOMIC
  • Subtitute the following four formulations (in the
    given
  • order)
  • 9a A/TIA/DE,ID NOT (B/TIB/DE,ID)
  • 9b B/TIB/DE,ID NOT (A/TIA/DE,ID)
  • 10 A/TI NOT (A/DE,IDB/TIB/DE,ID)
  • 11 B/TI NOT (B/DE,IDA/TIA/DE,ID)
  • for
  • (10) A/TI NOT (B/TIB/DE,ID)
  • and
  • (11) B/TI NOT (A/TIA/DE,ID).

16
  • The new set of formulations express 15 of the 16
    possible cases with regards to the presence of A
    and B in the two fields, not just 11.
  • NOTATOMIC is not abandoned.

17
  • NOTPRESET
  • It is also possible to use a special case of
    NOTPRESET, say NOTPRESET, to construct
    formulations that express the four situations in
    question. When constructing a new formulation,
    the first step in NOTPRESET is identical with
    the first step in NOTATOMIC combine as many
    atomic formulations as possible (in the light of
    earlier formulations) by the AND operator,
    considering the factors presence of A and B,
    occurrence of different cognitive agents and
    document frequency.

18
  • The new set of formulations express 15 of the 16
    possible cases with regards to the presence of A
    and B in the two fields, not just 11.

19
Concluding remarks
  • Ingwersens set of formulations should be
    modified to correspond to 15 of the 16 possible
    situations with regards to the presence of the
    terms A and B in the two fields.
  • Ingwersens approach gives the Boolean searcher a
    hint concerning the order in which the parts of a
    (possibly large) document set should be
    retrieved.

20
  • If the command language does not admit
    abbreviation of an OR formlation, NOTATOMIC is in
    my opinon preferable to NOTPRESET.
Write a Comment
User Comments (0)
About PowerShow.com