MetaWeb Search Engine Design - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

MetaWeb Search Engine Design

Description:

MetaWeb a metadata search engine. Fetch Web resources. Index the metadata extracted from them ... Parse metadata from META tags ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 16
Provided by: ccNct
Category:
Tags: metaweb | design | engine | meta | search | tags

less

Transcript and Presenter's Notes

Title: MetaWeb Search Engine Design


1
MetaWeb Search Engine Design
2
Introduction
  • Familiarize students with the design of a
    metadata search engine through a case study
    MetaWeb
  • MetaWeb a metadata search engine
  • Fetch Web resources
  • Index the metadata extracted from them
  • Provide a user interface to search

3
Architecture of MetaWeb
4
Overview of Components (I)
  • Gatherer
  • Read URLs from the URL database
  • Fetch Web-accessible resources by Fetcher
  • Parse metadata from ltMETAgt tags
  • Encode the metadata into SOIF (Summary Object
    Interchange Format)
  • Send the SOIF abstracts to a Repository managed
    by the Broker

5
Overview of Components (II)
  • Broker
  • A http daemon
  • Support 5 different actions
  • Receive SOIFs from one or more Gatherers
  • Remove duplicate information
  • Save SOIFs into the metadata repository
  • Index the collected information
  • Provide a WWW query interface

6
Implementation
  • A product of the MetaWeb project
  • http//www.dstc.edu.au/Research/metaweb
  • Written in Java
  • Support Oracle and mSQL RDBMS
  • Connect through JDBC

7
URL Database Structure
8
SOIF Example
  • _at_FILE http//www7.conf.au/
  • DC.Creator30 Andrew Wood, woody_at_dstc.edu.au
  • DC.Title47 The 7th International World Wide
    Web Conference

9
(No Transcript)
10
Metadata Repository
11
Metadata Repository Example
  • URL_RESOURCE
  • http//www.dstc.edu.au/RDU
  • COMPLETE_PROPERTY
  • DC.Date.Modified
  • SEQ_NUMBER
  • 2
  • LANG
  • en
  • PROPERTY_VALUE
  • 1997-01-28
  • VALUE_TIME
  • String
  • SCHEME
  • ISO8601

12
Broker Actions (I)
  • PUT request
  • Receive the SOIF abstract from Gatherers
  • Call the SOIF parser to parse the SOIF into
    attribute-value pairs, and save them into the
    Repository
  • GET request
  • Search requests from users
  • Find the matching URLs in the Repository, formats
    the result pages in an HTML file and returns them
    to the user

13
Broker Actions (II)
  • FIND request
  • Users want to see the SOIF abstract of a URL
  • Returns the SOIF abstract of the given URL in
    HTML format
  • BROWSE request
  • Users want to view harvested URLs
  • Lists the URLs in alphabetical order, with
    linkage to its full SOIF abstract
  • DELETE request
  • Deletes given URL or URLs in the repository

14
Query Types
  • Full text search in metadata words in any
    attributes
  • Any metadata
  • Simple attribute search one word in one
    attribute
  • DC.Creator renato
  • Complex attribute search multiple words in one
    attribute
  • DC.Title Queensland education
  • Query with boolean operators with NOT operator
  • DC.Subject cat-dog
  • Combined search search multiple attributes at
    one time
  • DC.Subject education DC.Title University

15
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com