Introduction to CNIDRs Isite - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Introduction to CNIDRs Isite

Description:

Field parser - recognize start/end of fields within individual documents ... specific - ATS-1, GILS, WAIS, GEO, Digital Collections, Museum Collections ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 18
Provided by: archibal
Category:

less

Transcript and Presenter's Notes

Title: Introduction to CNIDRs Isite


1
Introduction to CNIDRs Isite
  • Jim Fullton - MCNC/CNIDR
  • Archie Warnock - A/WWW Enterprises

2
What is Isite?
  • A freely available implementation of the Z39.50
    search/retrieval protocol
  • It includes a Unix-based server, a WWW gateway, a
    command-line client and a sophisticated text
    search engine
  • ftp//ftp.cnidr.org/pub/NIDR.tools/Isite
  • http//vinca.cnidr.org/software/Isite/Isite.html

3
What is Isearch?
  • Isearch is the successor to freeWAIS
  • Isearch is a sophisticated full-text search and
    retrieval system
  • Isearch is a component of Isite, an
    implementation of the NISO standard protocol
    Z39.50 for information search and retrieval
  • ftp//ftp.cnidr.org/pub/NIDR.tools/Isearch
  • http//vinca.cnidr.org/software/Isearch/Isearch.ht
    ml

4
System Components - I
  • Iindex, the Text Indexer - builds searchable
    version of the document collection
  • Implements fast word-based searching
  • Document parser - recognize start/end of
    individual documents
  • Field parser - recognize start/end of fields
    within individual documents

5
System Components - II
  • Isearch, the Search engine - searches a document
    collection based on user-supplied query
  • Command line search
  • Primarily used for testing
  • WWW gateway (using CGI)
  • End-user interface using forms
  • Z39.50 gateway

6
Isearch Capabilities
  • Fast full-text search
  • US AIDS Patent Collection - can search 250,000
    patents in lt 1 second
  • Fielded search
  • Can restrict searches to title, author, abstract,
    other fields
  • Relevance ranking
  • Search hits are assigned scores sorted

7
Isearch Capabilities
  • Word truncation
  • search for matri matches matrix and
    matrices
  • Boolean functions
  • AND, OR and ANDNOT combinations of different
    fields
  • Customized presentation of results
  • Phrase searching (coming soon)

8
Isearch Customization
  • Whats needed to customize Isearch?
  • Isearch is written in C
  • Documents are C objects - data procedures
  • Already have SGML HTML, among others
  • Object technology allows code reusability,
    customizing only where differences from existing
    objects occur

9
Isearch Customization
  • Whats needed to make arbitrary documents
    searchable?
  • Code to parse documents
  • Code to parse fields
  • Code to build brief and full result records
  • Yes, it requires programming
  • But, many of these are derived from existing
    procedures

10
Introduction to Z39.50
  • Developed for search and retrieval
  • Networked, client/server environment
  • Tested by working information scientists (Z39.50
    Implementors Group)
  • Commerical public domain support (Isite from
    CNIDR)
  • http//www.ds.internic.net/z3950/z3950.html

11
Attribute Sets
  • Attributes define how the query is specified
  • Use field names
  • Relation comparisons
  • Position location in field
  • Structure word/phrase/key/etc
  • Truncation left/right/none/etc
  • Completeness subfield/field

12
Attributes Element Sets
  • Supported Attribute Sets
  • BIB-1 ? GILS ? GEO
  • STAS
  • Element Sets define retrievable sets of use
    attributes
  • Brief record
  • Full record
  • Summary record (GEO)

13
Record Syntaxes
  • Z39.50 allows specification of a Preferred
    Record Syntax for results
  • SUTRS (unstructured text)
  • HTML
  • USMARC
  • GRS-1 (tagged, generalized syntax)

14
Profiles - GEO and Otherwise
  • Profiles define allowed attributes and element
    sets
  • Usually domain specific - ATS-1, GILS, WAIS, GEO,
    Digital Collections, Museum Collections
  • Supported by external agreement between client
    server (currently)
  • i.e., a GEO client talks to a GEO server

15
FGDC Enhancements
  • Search Engine (Iindex/Isearch)
  • Field types (text, numeric, date, others)
  • Search in nested fields
  • Search in numeric fields
  • Date Date Range Searching
  • Spatial Searching

16
FGDC Enhancements
  • Z39.50 Implementation (ZDist)
  • Support for GEO attributes element sets
  • GRS-1 record syntax
  • Support for additional (non-Isearch) search
    engines
  • Syntax to support nested query

17
Outstanding Issues
  • User Interface
  • What fields are searchable and how does the user
    indicate them?
  • How complex can the geographic queries be?
    Bounding box only? Complex regions?
Write a Comment
User Comments (0)
About PowerShow.com