Web Characterization

About This Presentation

Title:

Web Characterization

Description:

60% of queries are for music. Then movies. Then sports. Then news. The Deep Web ... http://www.mp3.com/ Link Structure of the Web. Crawling the Web. Web Crawl ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 30

Provided by: Doug9

Category:

more less

Transcript and Presenter's Notes

Title: Web Characterization

1
Web Characterization

Week 9
LBSC 690
Information Technology

2
Outline

What is the Web?
Whats on the Web?
What is the nature of the Web?
Preserving the Web

3
Defining the Web

HTTP, HTML, or URL?
Static, dynamic or streaming?
Public, protected, or internal?

4
Economics of the Web in 1995

Affordable storage
300,000 words/
Adequate backbone capacity
25,000 simultaneous transfers
Adequate last mile bandwidth
1 second/screen
Display capability
10 of US population
Effective search capabilities
Lycos (now google), Yahoo

5
Nature of the Web

Over one billion pages by 1999
Growing at 25 per month!
Google indexed about 3 billion pages in 2003
Unstable
Changing at 1 per week
Redundant
30-40 (near) duplicates
e.g., unix man page tree

6
Source Michael Lesk, How Much Information is
there in the World?
7
Number of Web Sites
8
Web Sites by Country, 2002
9
Whats a Web Site?

OCLC counts any server at port 80
Misses many servers at other ports
Some servers host unrelated content
Geocities
Some content requires specialized servers
rtsp

10
World Trade in 2001
Source World Trade Organization
11
World Trade
12
Global Internet User Population
2000
2005
English
English
Chinese
Source Global Reach
13
Widely Spoken Languages
Source http//www.g11n.com/faq.html
14
Source James Crawford, http//ourworld.compuserve
.com/homepages/JWCRAWFORD/can-pop.htm
15

Web Page Languages
Source Jack Xu, Excite_at_Home, 1999
16
European Web Size Exponential Growth
Source Extrapolated from Grefenstette and
Nioche, RIAO 2000
17
European Web Content
Source European Commission, Evolution of the
Internet and the World Wide Web in Europe, 1997
18
Live Streams
Almost 2000 Internet-accessible Radio and
Television Stations
source www.real.com, Feb 2000
19
Streaming Media