Search Engine Technology - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Search Engine Technology

Description:

Peri et al., Nucleic Acids Res. 2004 January 1; 32(Database issue): D497 D501. ... doi: 10.1093/nar/gkh070. Interleukin-2 receptor pathway. protein interaction ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 31
Provided by: rad2
Category:

less

Transcript and Presenter's Notes

Title: Search Engine Technology


1
Search Engine Technology10http//www.cs.columbi
a.edu/radev/SET07.html
  • March 28, 2007
  • Prof. Dragomir R. Radev
  • radev_at_umich.edu

2
SET Winter 2007
16. (Social) networks Random graph
models Properties of random graphs.
3
SET Winter 2007
17. Small worlds Scale-free networks
Power law distributions Centrality
4
(No Transcript)
5
(No Transcript)
6
Krebs 2004
7
(No Transcript)
8
Interleukin-2 receptor pathway protein
interaction network (from HPRD).
Peri et al., Nucleic Acids Res. 2004 January 1
32(Database issue) D497D501. doi
10.1093/nar/gkh070.
9
American Journal of Sociology, Vol. 100, No. 1.
"Chains of affection The structure of adolescent
romantic and sexual networks, Bearman PS, Moody
J, Stovel K.
10
The New York Times May 21, 2005
11
Email network
12
Networks
  • The Web
  • Citation networks
  • Social networks
  • Protein interaction networks
  • Technological networks
  • Other networks
  • Movie actor networks
  • Cooccurrence of characters in Les Miserables
  • Board membership

13
Types of networks
  • Directed/undirected
  • Can have weights
  • Single-mode vs. bipartite (e.g., movie-actor
    graphs)

14
Semantic network
15
Dependency network
bought
Meredith
yesterday
apples
green
16
Dependency network
17
Random network
18
Lexical networks
  • A special case of networks where nodes are words
    or documents and edges link semantically related
    nodes
  • Other examples
  • Words used in dictionary definitions
  • Names of people mentioned in the same story
  • Words that translate to the same word

19
Analyzing networks
  • Clustering coefficient
  • Watts/Strogatz cc triangles/triples
  • Example
  • Diameter (longest shortest path)
  • Average shortest path (asp)
  • Strongly connected component (SCC)
  • Weakly connected component (WCC)

20
Degree distribution
  • Uniform
  • Poisson
  • Power-law (with coefficient a).

21
Types of networks
  • Regular networks
  • Uniform degree distribution
  • Random networks
  • Memoryless
  • Poisson degree distribution
  • Characteristic value
  • Low clustering coefficient
  • Large asp
  • Small world networks
  • High transitivity
  • Presence of hubs (memory)
  • High clustering coefficient
  • (e.g., 1000 times higher than random)
  • Small asp
  • Some are scale free
  • Immune to random attacks
  • (Very) vulnerable to targeted attacks
  • Power law degree distribution
  • (typical value of a between 2 and 3)

22
From Mark Newman 2003. The structure and
function of complex networks
23
Comparing the dependency graph to a random
(Poisson) graph
24
Properties of lexical networks
  • Entries in a thesaurusMotter et al. 2002
  • c/c0 260 (n30,000)
  • Co-occurrence networks Dorogovtsev and Mendes
    2001, Sole and Ferrer i Cancho 2001
  • c/c0 1,000 (n400,000)
  • Mental lexicon Vitevitch 2005
  • c/c0 278 (n19,340)

25
(No Transcript)
26
Graph-based representations
Square connectivity(incidence) matrix
Graph G (V,E)
27
Bipartite graphs and one-mode projections
A
B
C
D
E
1
2
3
4
28
Power laws
  • Web site size (Huberman and Adamic 1999)
  • Power-law connectivity (Barabasi and Albert
    1999) exponents 2.45 for out-degree and 2.1 for
    the in-degree
  • Others call graphs among telephone carriers,
    citation networks (Redner 1998), e.g., Erdos,
    collaboration graph of actors, metabolic pathways
    (Jeong et al. 2000), protein networks (Maslov and
    Sneppen 2002). All values of gamma are around 2-3.

29
Small-world networks
  • Diameter average length of the shortest path
    between all pairs of nodes. Example
  • Milgram experiment (1967)
  • Kansas/Omaha --gt Boston (42/160 letters)
  • diameter 6
  • Albert et al. 1999 average distance between two
    verstices is d 0.35 2.06 log10n. For n 109,
    d18.89.
  • Six degrees of separation

30
Clustering coefficient
  • Cliquishness (c) between the kv (kv 1)/2 pairs
    of neighbors.
  • Examples
Write a Comment
User Comments (0)
About PowerShow.com