The Anatomy of a LargeScale Hypertextual Web Search Engine - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

The Anatomy of a LargeScale Hypertextual Web Search Engine

Description:

Distributed Crawlers. Storeserver. Repository. Indexer. Barrels. URL Resolver. Sorter. DumpLexicon ... Indexer and crawler ran simultaneously. Future work: ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 16

Provided by: joseph272

Category:

Tags: largescale | anatomy | crawlers | engine | footandmouth | hypertextual | search | web

Transcript and Presenter's Notes

Title: The Anatomy of a LargeScale Hypertextual Web Search Engine

1
The Anatomy of a Large-Scale Hypertextual Web
Search Engine

Sergey Brin Lawrence Page
Presented by
Siddharth Sriram Joseph Xavier
Department of Electrical and Computer Engineering

2
Overview

_at_ Stanford University
Presented as a prototype of a large-scale search
engine
26 million pages, 147 GB
Google googol
Issues
Scaling
Exploiting structure in Hypertext
PageRank Algorithm
Architecture
Data Structures, Crawling, Indexing, Searching
Results

3

PageRank Algorithm using link graph
Anchor Text
Associate the anchor text of a link to the page
it points to
Information Retrieval
TREC gt well controlled, homogenous collections
Not equipped to handle Hypertext documents
Vector Space Model not enough

4
Architecture

URL Server
Distributed Crawlers
Storeserver
Repository
Indexer
Barrels
URL Resolver
Sorter
DumpLexicon
Searcher

5
Data Structures

BigFiles
Repository
Document Index
Lexicon
Hit Lists
Forward Index
Inverted Index

6
Repository

Full HTML of every webpage
Compressed using zlib
Prefixed by docID, length, URL
Files stored one after another

7
Document Index

Fixed width ISAM index
Stores document status, pointer to repository,
document checksum
If document has been crawled, ptr to variable
length docinfo file stored
Otherwise ptr to URLlist stored

8
Hit Lists

Plain and Fancy hits
2 bytes for each hit
Length of hit list
stored before hit

9
Forward Index

Stored in 64 barrels.
If a document contains words in a barrel, then
the docID is recorded into the barrel, with the
list of wordIDs and hitlists.
Each wordID stored as a relative difference from
the minimum wordID in a barrel. (24 bits for the
wordID, 8 for hitlist length).

10
Inverted Index

Same barrels as forward index, but processed by
the sorter.
For every wordID, doclist of docIDs generated,
with corresponding hitlists.
Two sets of inverted barrels, one for hitlists
with anchor or title text, another for all
hitlists.

11
Indexing the Web

Parser flex used to generate a lexical analyzer
involved a fair amount or work
Indexing Documents into barrels
Every word hashed into wordID
Occurrences translated into hitlists and written
into forward barrels
Lexicon needs to be shared
Extra words written into a log, processed by one
final indexer

12
Searching

Parse the query.
Convert words into wordIDs.
Seek to the start of the doclist in the short
barrel for every word.
Scan through the doclists until there is a
document that matches all the search terms.
Compute the rank of that document for the query.
If we are in the short barrels and at the end of
any doclist, seek to the start of the doclist in
the full barrel for every word and go to step 4.
If we are not at the end of any doclist go to
step 4.
Sort the documents that have matched by rank and
return the top k.

13
Ranking

Count weight generated for each word in query
Dot product taken with type weight vector (for
single word queries) or with type-prox weight
vector (for multiple word queries)
Combined with PageRank to give final score.

14
Results

High quality pages
zlib 31 ratio
9 days to download 26 million pages
Indexer and crawler ran simultaneously
Future work
Query caching, smart disk allocation, updates
User context, relevance feedback

15
Footnote foot in mouth!!

we expect that advertising funded search engines
will be inherently biased towards the advertisers
and away from the needs of the consumers.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

How to Improve Search Engine Indexing in 2018 by Yug Technology Udaipur Web Designer PowerPoint PPT Presentation

How to Improve Search Engine Indexing in 2018 by Yug Technology Udaipur Web Designer - , if you would like to seek professional help to improve indexing and increasing your website ranking, it is best to hire a search engine optimization company like Udaipur Web Designer (Yug Technology) that is one the Best IT Company in Udaipur , it well versed with different aspects of online. | PowerPoint PPT presentation | free to view

Removing Web Pages From Search Engine Index | Web Brain InfoTech PowerPoint PPT Presentation

Removing Web Pages From Search Engine Index | Web Brain InfoTech - Removing web pages from search index is a technical job for webmaster. You can use Robots file and use some technical codes to do this easily. Hire us for your help by contact at +91-782-774-2414. | PowerPoint PPT presentation | free to view

Search Engine Optimization in Marketing PowerPoint PPT Presentation

Search Engine Optimization in Marketing - Search Engine Optimization is important to your business and at Infinity Web Solutions, we offer SEO services. Visit us at www.infinity-web-solutions.com | PowerPoint PPT presentation | free to view

The Anatomy of a LargeScale Hypertextual Web Search Engine PowerPoint PPT Presentation

The Anatomy of a LargeScale Hypertextual Web Search Engine - Including page rank, anchor text, and proximity information ... Improve search efficiency and scale to approximately 100 million web pages ... | PowerPoint PPT presentation | free to view

Search Engine Optimization Miami (SEO Services Miami in affordable budget) (1) PowerPoint PPT Presentation

Search Engine Optimization Miami (SEO Services Miami in affordable budget) (1) - In this PowerPoint presentation we are going to learn about search engine optimization Miami services. Smash Interactive is leading IT organization that provide services including social media optimization, pay per click, website development, online reputation management and more services. For more details, contact us: URL: http://smashinteractiveagency.com/services-search-engine-optimization/ Email: info@smashtoday.com Smash Interactive Agency Contact: 786.899.2424 Address: 4955 SW 75th Ave Miami, Fl 33155 | PowerPoint PPT presentation | free to view

Search Engine Optimization Miami (SEO Services Miami in affordable budget) PowerPoint PPT Presentation

Search Engine Optimization Miami (SEO Services Miami in affordable budget) - In this PowerPoint presentation we are going to learn about search engine optimization Miami services. Smash Interactive is leading IT organization that provide services including social media optimization, pay per click, website development, online reputation management and more services. For more details, contact us: URL: http://smashinteractiveagency.com/services-search-engine-optimization/ Email: info@smashtoday.com Smash Interactive Agency Contact: 786.899.2424 Address: 4955 SW 75th Ave Miami, Fl 33155 | PowerPoint PPT presentation | free to view

What is Search Engine Optimization l Best SEO Tools PowerPoint PPT Presentation

What is Search Engine Optimization l Best SEO Tools - Here We Can Find Digital Marketing Tools, Strategy, SEO Submission List, Seo Updated Site Lists, SMO Site List, Web 2.0 Sites, SEM Strategy | PowerPoint PPT presentation | free to view

The PageRank algorithm The Anatomy of a LargeScale Hypertextual Web Search Engine Sergey Brin and La PowerPoint PPT Presentation

The PageRank algorithm The Anatomy of a LargeScale Hypertextual Web Search Engine Sergey Brin and La - Sergey Brin and Lawrence Page, 1998. David Pinto. Faculty of Computer Science, BUAP ... extraction. Summaries by extraction. Questions? David Pinto, FCC, BUAP ... | PowerPoint PPT presentation | free to view

Nashville SEO Company | Search Engine Optimization Services PowerPoint PPT Presentation

Nashville SEO Company | Search Engine Optimization Services - Partner with the top Nashville SEO Company to ignite growth in your business Our SEO services here in Nashville, Tennessee, will get you ranked higher in search engines. For details visit us: https://www.cardinaldigitalmarketing.com/nashville-seo-company/ | PowerPoint PPT presentation | free to view

Jacksonville SEO Company | Search Engine Optimization Services PowerPoint PPT Presentation

Jacksonville SEO Company | Search Engine Optimization Services - Partner with the top Jacksonville SEO Company to ignite growth in your business Our SEO services here in Jacksonville, Florida, will get you ranked higher in search engines. For more details visit us: https://www.cardinaldigitalmarketing.com/jacksonville-seo-company/ | PowerPoint PPT presentation | free to view

Anatomy of a LargeScale Hypertextual Web Search Engine PowerPoint PPT Presentation

Anatomy of a LargeScale Hypertextual Web Search Engine - ... Real-Time Embedded System Technology), Soongsil Univ, Korea ... Query : 'Bill Clinton' - Bill Clinton Sucks - high quality information available on this topic ... | PowerPoint PPT presentation | free to view

How Does Web Search Engine Works? PowerPoint PPT Presentation

How Does Web Search Engine Works? - A web search engine is a software system that is designed to search for information on the World Wide. It works by sending out a Spider to fetch as many documents as possible. Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document. Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query. | PowerPoint PPT presentation | free to view

Getting Your Web Site Listed PowerPoint PPT Presentation

Getting Your Web Site Listed - Getting Your Web Site Listed Danny Sullivan Editor, Search Engine Watch http://searchenginewatch.com/ | PowerPoint PPT presentation | free to view

The Anatomy of a largescale hypertextual Web search engine by Sergey Brin, Lawrence Page appearing i PowerPoint PPT Presentation

The Anatomy of a largescale hypertextual Web search engine by Sergey Brin, Lawrence Page appearing i - finds related pages (based on anchor text ... As of late 1997, only one of four of the major search engines ... Conference on Hypertext, New York, 1996. ... | PowerPoint PPT presentation | free to view

Online Web Marketing Services PowerPoint PPT Presentation

Online Web Marketing Services - Segnant is a Creative web solutions company which offers professional search engine, web marketing and PPC solutions. | PowerPoint PPT presentation | free to view

Search Engine Optimization for Your Web Site PowerPoint PPT Presentation

Search Engine Optimization for Your Web Site - Search Engines Deliver Indexes User requests information via search page Query engine searches database Delivers list of web resources Creates results web page ... | PowerPoint PPT presentation | free to view

Search Engine Marketing Free Traffic for Your Web Site PowerPoint PPT Presentation

Search Engine Marketing Free Traffic for Your Web Site - Search Engines (88%) Source: NFO/Research. Top Ways to Increase Site Traffic ... Only the top ten search engine results yield significant revenue (out of ... | PowerPoint PPT presentation | free to view

Search Engine Optimization (SEO) PowerPoint PPT Presentation

Search Engine Optimization (SEO) - 'Natural,' or 'organic,' search engine optimization (SEO) is designing, writing, ... Organic Listings: Listings that search engines do not sell (unlike paid listings) ... | PowerPoint PPT presentation | free to view

Important Terms Related to the World Wide Web PowerPoint PPT Presentation

Important Terms Related to the World Wide Web - The World Wide Web is used by almost all of us. Hence, it becomes important to know and understand the various terms that are related to it. | PowerPoint PPT presentation | free to view

Search Engine Marketing (Miami) PowerPoint PPT Presentation

Search Engine Marketing (Miami) - Increase your business online visibility through Search Engine Marketing (Miami) that includes search engine optimization, blogs and other social media services. For more details contact us: Url: http://smashinteractiveagency.com/ Email: info@smashtoday.com Smash Interactive Agency Contact: 786.899.2424 | PowerPoint PPT presentation | free to view

Searching Web Better PowerPoint PPT Presentation

Searching Web Better - ... RSCF-based Metasearch Engine. Search Engine Components. Feature ... Metasearch Engine. Receives query from user. Sends query to multiple search engines ... | PowerPoint PPT presentation | free to view

Research Problems in Semantic Web Search PowerPoint PPT Presentation

Research Problems in Semantic Web Search - Research Problems in Semantic Web Search _____ Varish Mulwad * Agenda Introduction Swoogle Swoogle s Competition Sindice Semantic Web Search ... | PowerPoint PPT presentation | free to view

Approaches To Boost Your Online Search Engine Rankings PowerPoint PPT Presentation

Approaches To Boost Your Online Search Engine Rankings - Do you want to improve your websites' search engine rankings? This short article can help you to understand the ideas of search engine optimization. Read now | PowerPoint PPT presentation | free to view

Information Retrieval and Web Search PowerPoint PPT Presentation

Information Retrieval and Web Search - Information Retrieval and Web Search Adopted from Slides from Bin Liu @UIC & Christopher Manning and Prabhakar Raghavan @ Stanford Search using inverted index Given a ... | PowerPoint PPT presentation | free to view

Search Engine Comparisons PowerPoint PPT Presentation

Search Engine Comparisons - Will an 'open web' search engine always have my answers? ... SpeechBot (keyword search engine demo by Compaq, uses speech technology to ... | PowerPoint PPT presentation | free to view

Search Engine Optimization PowerPoint PPT Presentation

Search Engine Optimization - Search engine optimization (SEO) is the process of affecting the online visibility of a websites or a web page in a web search engine's unpaid results—often referred to as "natural", “organic", or "earned" results.SEO may target different kinds of search, including image search, local search, video search, academic search, news search and industry-specific vertical search engines. | PowerPoint PPT presentation | free to view

Build Your Estate Planning Business On The Web PowerPoint PPT Presentation

Build Your Estate Planning Business On The Web - The Truth About Search Engines. Yahoo Search Results For 'estate planning' 354,000 Results ... Yahoo Lists 20 Results Per Page = 17,700 Pages of Search Results ... | PowerPoint PPT presentation | free to view