IR Homework - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

IR Homework

Description:

Output: a ranked list of search results from Reuters-21578 collection ... The source code (and optionally your executable file) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 11
Provided by: 140122
Category:
Tags: code | free | homework | list | zip

less

Transcript and Presenter's Notes

Title: IR Homework


1
IR Homework 2
  • By J. H. Wang
  • Apr. 1, 2009

2
Programming Exercise 2 Query Processing and
Searching
  • Goal to search for relevant documents
  • Input a query
  • (simple search keyword, Boolean)
  • Output a ranked list of search results from
    Reuters-21578 collection
  • (details to be described later)

3
Input User Query
  • Simple Search
  • Keyword
  • Ex Malaysia, Nuclear,
  • Free text
  • Ex United Nations, Nuclear Submarine Fleet,
  • Simple Boolean search
  • Ex Israel OR Pakistan,

4
Output Ranked List
  • A ranked list of search results from
    Reuters-21578 collection
  • Term weighting scheme TF-IDF
  • The ranking is done in vector space model, i.e.
    the cosine similarity between query and document
    vectors

wij (1 log tfij) log (N/dfi)
5
Example Output
  • Ex
  • Query Bangladesh
  • Result ltDocgt ltsimilarity scoregt
  • 244 0.85
  • 443 0.67
  • 413 0.3

6
Optional Features
  • Optional functionalities
  • Better user interface for query
  • Complex queries phrase, substring, proximity
    search, combinations of Boolean operators,
  • Stopword removal, champion lists,
    impact-ordering, tiered index,
  • Different ranking/term weighting schemes
    variants of TF-IDF,
  • They should be able to be turned off by a
    parameter trigger

7
Submission
  • Your submission should include
  • The source code (and optionally your executable
    file)
  • A one-page description that includes the
    following
  • Major features in your work (ex high efficiency,
    low storage, multiple input formats, huge corpus,
    )
  • Major difficulties encountered
  • Special requirements for execution environments
    (ex Java Runtime Environment, special compilers,
    )
  • The names and the responsible parts of each
    individual member should be clearly identified
    for team work
  • Due two weeks (Apr. 15, 2009)

8
Submission Instructions
  • Programs or homework in electronic files must be
    submitted directly to the TA as follows
  • Team members list please e-mail your team
    members list to the TA (t6598006 _at_ ntut. edu. tw)
    even if youre the only team member
  • Preparing submission file one single compressed
    file named as ltTeamID-HWn.zipgt, for example,
    IR0901-HW1.ZIP
  • Remember to specify the names of your team
    members and student ID in the files and
    documentation
  • E-mail or online submission TBD
  • If you cannot successfully submit your work,
    please contact with the TA or the instructor

9
Evaluation
  • Some example queries will be submitted to your
    program, and the ranked list will be checked for
    effectiveness (recall and precision)
  • Optional features will be considered as bonus
  • Various query forms, weighting schemes, efficient
    scoring and ranking,
  • You might be required to demo if the program
    submitted was unable to run by TA

10
Questions?
Write a Comment
User Comments (0)
About PowerShow.com