Hiroshi Nakagawa - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Hiroshi Nakagawa

Description:

... Documents (Electronics form: for example Online Journals) Index Data Bases ... Free Terms : A user can use any terms to identify documents he/she would like to ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 12
Provided by: naka55
Category:

less

Transcript and Presenter's Notes

Title: Hiroshi Nakagawa


1
Hiroshi Nakagawa?? ??
  • Prof. Digital Library Division,
  • Information Technology Center,The University of
    Tokyo
  • E-mail nakagawa_at_r.dl.itc.u-tokyo.ac.jp
  • URL http//www.r.dl.itc.u-tokyo.ac.jp/nakagawa/

2
Search Academic Information In The University of
Tokyo
  • What is information retrieval? Anyway useful.
  • What is academic information available here?
  • How to access these kinds of information?
  • Graduated School Activities

3
Information Retrieval
IR system
OPAC Library Information
Id like to find book XXX
Internet
Results
IR engine
Many kinds of Data Base
User terminal
QueryXXX
4
Types of Information Resources
  • Original Documents (paper form)
  • Original Documents (Electronics form for
    example Online Journals)
  • Index Data Bases
  • Keywords index
  • Contents Zassaku,SwetScan
  • Abstract BookContents
  • Citation Index WebofScience
  • Online Public Access Catalog (OPAC WebOPAC)

5
Structure of IR system
Posting file
Search engine Pointers for documents coincide
with
DOC1 DOC2 DOC3
doc1
queryQ
doc2
doc3
Documents themselves Or Catalog , and/or Location
information
6
Posting file
Inverted Index File
Document set
7
Structure of IR system
  • Documents to be retrieved is indexed with several
    terms.
  • A user inputs a query consisting of keywords
    which seem to identify the document he/she wants.
  • A search engine interprets this query to find
    documents to be asked. Usually more than one
    documents fits well to the query. These well
    fitted documents are pointed via pointers of
    posting file.?
  • Information about document like location,
    catalog, URL, and so forth are stored in the
    document data base.
  • Recently, even documents themselves are stored
    and retrieved. This is so called online journal.

8
Term and Retrieval
  • Controlled Terms In query, a user can use only
    predetermined term set called controlled terms.
  • Free Terms A user can use any terms to
    identify documents he/she would like to retrieve.
  • Perfect match search( Boolean search)
  • Best match strategy( Vector space model)

9
Full Text Search
  • As if the whole text is stored in computers
    memories.
  • If the whole text is really stored, we need
    string matching for log long full text, but it is
    inefficient and unrealistic.
  • In reality, we extract as many index words as
    possible from text, and use them as indices.
  • We can use the following query in full text
    search.
  • In original text, termXXX is appeared within
    five terms distance of term YYY.
  • Term XXX and YYY appeare in the same
    paragraph.
  • Indexing is still very research topics and
    improved day by day. I am one of the researchers
    who research new and efficient indexing methods.

10
Cross Language Information Retrieval
  • Search Keyword is in languageA
  • Data Bases are described in different languageB
  • Query keywords are translated into language B
    and then search DB of language B
  • Results are hopefully translated into languageA
    with Machine Translation system
  • Our CLIR system (yet to be fully implemented)

11
If you would like to know more about IR, visit
the following URL
  • http//www.r.dl.itc.u-tokyo.ac.jp/nakagawa/infoDB
    /syllabus.html
  • This is a lecture note for graduated school
    ???? ???????????
Write a Comment
User Comments (0)
About PowerShow.com