RSS Information Retrieval and Natural Language Search - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

RSS Information Retrieval and Natural Language Search

Description:

... such as blog entries, news headlines, audio, and video in a standardized format. ... http://feeds.washingtonpost.com/wp-dyn/rss/politics/index_xml. bbc ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 17
Provided by: Van63
Category:

less

Transcript and Presenter's Notes

Title: RSS Information Retrieval and Natural Language Search


1
RSS Information Retrieval and Natural Language
Search
By Vandan Gogna
  • CS 731
  • Professor Heng Ji

2
Proposal
  • The purpose of this project is to create an
    application that will use Natural Language
    Processing on data from different sources.
  • The sources may be RSS feeds (internet) and \ or
    documents present locally on the machine.
  • Web sources ex - Reuters, CNN, BBC etc.
  • Search will be focused on keywords that are
    provided by the user.
  • Use of Vector Space Model to return the most
    accurate contextual results to the user.

3
Proposal (cont.)
  • A keyword is specified by the end user via the UI
    (user interface), which the application would
    then search using different RSS feeds and produce
    the most relevant results for the user.
  • Once the initial search has retrieved the data
    from each individual web feed, the result would
    be then fed to the IR (information retrieval
    tool) which will perform an index search and sort
    the data according to its relevancy.
  • Data that is searched using the RSS feeds will be
    real time and get the most recent and accurate
    results related to a particular topic.

4
GOAL
  • The goal of this project is to effectively use
    NLP to search for specific keywords across
    various categories of news articles that are
    obtained from different sources.
  • Allow the user to search for data seamlessly and
    transparently about topics ranging from the most
    recently discovered to a few days old.
  • Allows the user to see the final results based on
    indexing from the most relevant to the least
    relevant topic

5
GOAL (cont.)
  • The user will use the application by specifying
    their request and the application will then
    return the results in a 2 part process
  • First it will search all the RSS news feeds or
    local machine according to the user input.
  • Second, return the results from all user
    specified feeds according to the most relevant
    article based on indexing.

6
GOAL (cont.)
  • Better time used for searching.
  • Greater number of precise and accurate results.
  • Little or no time spent searching multiple
    different sources.
  • Web based results that are ranked and can be
    easily view and accessed.
  • Once the results from the query are indexed the
    final list of the results can be searched almost
    instantaneously providing quick response time
    with little latency.

7
Benefits and Features
  • Quick information retrieval
  • Ranked Documents \ links
  • Data classification (Indexing)
  • Time efficient results.
  • Can provide better sources of information for
    academic research topics and information analysis

8
Implementation
  • Searchable and Indexable sources
  • RSS new feeds .
  • XML files
  • Text Files
  • Source code files
  • Languages Used
  • C
  • ASP.NET
  • XML parsing

9
What is RSS
  • RSS is a family of Web feed formats used to
    publish frequently updated works such as blog
    entries, news headlines, audio, and video in a
    standardized format.
  • An RSS document (which is called a "feed", "web
    feed",3 or "channel") includes full or
    summarized text, plus metadata such as publishing
    dates and authorship.
  • RSS feeds can be read using software called an
    "RSS reader", "feed reader", or "aggregator",
    which can be web-based or desktop-based.
  • Ex Firefox, Google Reader, My Yahoo etc

10
RSS Feed (cont)
  • Example of a RSS file\feed
  • lt?xml version"1.0"?gt ltrdfRDF xmlnsrdf"http//w
    ww.w3.org/1999/02/22-rdf-syntax-ns"
    xmlns"http//purl.org/rss/1.0/"gt ltchannel
    rdfabout"http//www.xml.com/xml/news.rss"gt
    lttitlegtXML.comlt/titlegt ltlinkgthttp//xml.com/publt/l
    inkgt ltdescriptiongt XML.com features a rich mix of
    information and services for the XML community.
    lt/descriptiongt ltimage rdfresource"http//xml.com
    /universal/images/xml_tiny.gif" /gt ltitemsgt
    ltrdfSeqgt ltrdfli rdfresource"http//xml.com/pub
    /2000/08/09/xslt/xslt.html" /gt ltrdfli
    rdfresource"http//xml.com/pub/2000/08/09/rdfdb/
    index.html" /gt lt/rdfSeqgt lt/itemsgt lttextinput
    rdfresource"http//search.xml.com" /gt
    lt/channelgt ltimage rdfabout"http//xml.com/univer
    sal/images/xml_tiny.gif"gt lttitlegtXML.comlt/titlegt
    ltlinkgthttp//www.xml.comlt/linkgt
    lturlgthttp//xml.com/universal/images/xml_tiny.giflt
    /urlgt lt/imagegt ltitem rdfabout"http//xml.com/pub
    /2000/08/09/xslt/xslt.html"gt lttitlegtProcessing
    Inclusions with XSLTlt/titlegt ltlinkgthttp//xml.com/
    pub/2000/08/09/xslt/xslt.htmllt/linkgt
    ltdescriptiongt Processing document inclusions with
    general XML tools can be problematic. This
    article proposes a way of preserving inclusion
    information through SAX-based processing.
    lt/descriptiongt lt/itemgt ltitem rdfabout"http//xml
    .com/pub/2000/08/09/rdfdb/index.html"gt
    lttitlegtPutting RDF to Worklt/titlegt
    ltlinkgthttp//xml.com/pub/2000/08/09/rdfdb/index.ht
    mllt/linkgt ltdescriptiongt Tool and API support for
    the Resource Description Framework is slowly
    coming of age. Edd Dumbill takes a look at RDFDB,
    one of the most exciting new RDF toolkits.
    lt/descriptiongt lt/itemgt lttextinput
    rdfabout"http//search.xml.com"gt lttitlegtSearch
    XML.comlt/titlegt ltdescriptiongtSearch XML.com's XML
    collectionlt/descriptiongt ltnamegtslt/namegt
    ltlinkgthttp//search.xml.comlt/linkgt lt/textinputgt
    lt/rdfRDFgt

11
RSS feed Examples
  • Sources
  • Yahoo
  • http//rss.news.yahoo.com/rss/topstories
  • Reuters
  • http//feeds.reuters.com/Reuters/PoliticsNews?for
    matxml
  • Washington post
  • http//feeds.washingtonpost.com/wp-dyn/rss/politi
    cs/index_xml
  • bbc
  • http//newsrss.bbc.co.uk/rss/newsonline_uk_editio
    n/world/rss.xml
  • google
  • http//news.google.com/?outputrss
  • msnbc
  • http//rss.msnbc.msn.com/id/3032091/device/rss/rs
    s.xml

12
Use case scenarios
  • User wants to search for a key word over the web.
  • User may want to search for a key word in
    documents present in a particular location on
    their desktop. (c\, my documents, etc..)
  • User may want to see the relevance of topics
    comparable to subtopics or equivalent keywords.
    Ex employee vs personel, single inheritance vs
    multiple inheritance, java vs C etc

13
Use Case Scenario (cont.)
  • User will enter a search query. (input)
  • User will choose what sources they want to search
    (input).
  • Search is applied. (background operation)
  • Results are indexed. (background operation)
  • Results are displayed. (Final output)

14
System Architecture
15
(No Transcript)
16
Final Thoughts
  • Allow the user to find answers quickly.
  • Uses robust technologies such as C, ASP.Net to
    implement a well supported application.
  • Provides a web enable UI that allows the user to
    access the website from any remote machine.
  • Final results are displayed by highest ranking
    allowing the user to view the complete result at
    the end.
Write a Comment
User Comments (0)
About PowerShow.com