Graph Structure in the Web - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Graph Structure in the Web

Description:

Graph Structure in the Web. Alta Vista. IBM Almaden. Compaq SRC. 9th WWW Conference htttp://www9.org. Vamsi Vutukuru, Anuj Khare ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 22
Provided by: krup
Category:
Tags: alta | graph | structure | vista | web

less

Transcript and Presenter's Notes

Title: Graph Structure in the Web


1
Graph Structure in the Web
  • Alta Vista
  • IBM Almaden
  • Compaq SRC
  • 9th WWW Conference htttp//www9.org

Vamsi Vutukuru, Anuj Khare
2
Presentation Structure
  • Introduction
  • Motivation
  • Prior Work
  • Experiments
  • Infrastructure
  • Algorithms run
  • Results
  • Interpretation
  • Future Work

3
Motivation
Why study the Web graph ?
  • Crawl strategies
  • Behavior of Web Algorithms
  • HITS, PageRank
  • Evolution of Web
  • Webrings, Bipartite cores

4
Prior Work
  • Observations of power law distributions
  • Kumar et. al. 40 million pages (1997 crawl)
  • Barabasi et. al. 325K nodes nd.edu domain
    (99)
  • diameter of the web 19
  • Graph theoretic methods
  • Klienberg, PageRank - Search
  • Mendelzon and Wood .Web Mining (95)

5
Terminology
  • Directed Graph
  • Out degree, in-degree
  • Strongly Connected Component (SCC)
  • Weakly Connected Component (WCC)
  • BFS
  • Diameter
  • max Shortest_Path(u,v) for all u,v ? V
  • average Shortest_Path(u,v) for all u,v ? V

6
Infrastructure
  • Data
  • 2 AltaVista crawls - May 99, October 99
  • 203 million web-pages/nodes
  • 1.5 billion links/edges
  • Connectivity Server 2 (CS2)
  • 465 MHz Compaq AlphaServer 4100
  • 12GB RAM
  • BFS reaching 100mi nodes 4 minutes

7
Algorithms Run
  • BFS
  • WCC algorithm
  • SCC algorithm

8
Results
Power Law for in-degree Probability that a node
has i in-degree is propotional to 1/
ix
  • Degree distributions

exponent 2.1
exponent 2.72
9
(No Transcript)
10
Undirected Connected Components
Giant WCC 186mi nodes (91) Is this because of
junctions ? Remove nodes with in-degree
5 WCC 59mi nodes
exponent 2.5
Connectivity is resilient Hubs and authorities
are embedded in a graph that is well connected
even without them
11
Strongly Connected Components
SCC 56mi pages (28)
Where have the other pages gone ?
exponent 2.5
12
Bowtie
13
Experiments
  • Random - start BFS
  • 570 randomly chosen start nodes
  • Forward BFS
  • Backward BFS
  • Start nodes
  • BFS dies out with 90 nodes
  • BFS explodes to cover 100mi
  • Both Forward and Backward BFS explode

14
(No Transcript)
15
Interpretation
WCC - 186mi nodes SCC - 56mi nodes DISC
TOTAL WCC
SCC IN Forward BFS explodes SCC OUT
Backward BFS explodes
16
IN, OUT and TENDRILS
  • Every BFS start node in SCC reaches
  • 99,807,161 through in-link expansion
  • Hence SCC IN 100mi
  • 99,630,178 through out-link expansion
  • Hence SCC OUT 100mi

TENDRILS WCC ( SCC IN OUT )
17
IN and OUT
128 nodes in IN. 134 nodes in OUT
OUT tends to encounter larger neighborhoods.
18
SCC
136 nodes in SCC
BFS depth
Directed diameter is 28
19
More Observations
Given random start and finish pages how likely
are we to get from start page to finish page
? 24 Maximum finite shortest path length
475430905
Average Connected distance
20
Future Work
  • Further analysis of SCC, IN, OUT, TENDRILS
  • Is the structure stable ?
  • Mathematical models for evolving graphs
  • Applicability to phone-call graph,
    purchase/transaction graphs etc
  • Explore other notions of connectivity
  • co-citation
  • bibiliographic coupling

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com