CrawlerBased Search Engine - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

CrawlerBased Search Engine

Description:

... search engine. A script/bot that searches the web in methodical, automated manner (wikipedia, ... is we are all interested in how a search engine works. ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 13
Provided by: Rage
Category:

less

Transcript and Presenter's Notes

Title: CrawlerBased Search Engine


1
Crawler-Based Search Engine
  • By Ryan Caplet, Morris Wright and Bryan Chapman

2
Background
  • Crawler based search engine
  • A script/bot that searches the web in methodical,
    automated manner (wikipedia, web crawler)
  • The bot starts with seeds (small list of URLs) to
    create a bigger list of sites to visit.
  • And so on

3
Motivation
  • The motivation for this project is we are all
    interested in how a search engine works.
  • The way we are doing it we are getting more
    experience in various programming languages and
    programs

4
Initial Priorities
  • Set up server
  • Set up database
  • Both be fully functional
  • Setup indexer
  • Make indexer work with the web page
  • Ranking

5
Projected Team Member Breakdown
  • Bryan Chapman
  • The Crawler
  • Analyzing Files
  • Ryan Caplet
  • Search Functions
  • Test Functions
  • Morris Wright
  • UI Development
  • Database Management
  • Web Server Account Manager

6
Development Environment
  • Use of Linux and Apache Web Server
  • A possible place for development is the UCONN ECS
    web server
  • Use of MySQL

7
Programming Languages
  • PHP
  • For web page programming
  • Perl or Python
  • Possibly for other scripting needs
  • HTML
  • For displaying web pages
  • Server Query Language
  • Interaction with the database

8
Database Management - Projected
  • Four Fields
  • ID
  • Title
  • URL
  • Keywords

9
Projected Security Concerns
  • Prevent Injections
  • Make sure search queries match what is in the
    database
  • Filter through webpage tags

10
Basic Use
  • Our basic scope is to search the UCONN network
    for instances of what we want to search for
  • URLs that are searched are going to be added to
    an SQL database.

11
Test Plans
  • Test plans for this project will be
  • Keeping good consistency of rendering across
    different OSs/Browsers
  • Check to make sure that search queries are match
    what is in the database

12
Conclusion
  • And that is it!
Write a Comment
User Comments (0)
About PowerShow.com