Title: Search Engine Technology
1Search Engine Technology10http//www.cs.columbi
a.edu/radev/SET07.html
- March 28, 2007
- Prof. Dragomir R. Radev
- radev_at_umich.edu
2SET Winter 2007
16. (Social) networks Random graph
models Properties of random graphs.
3SET Winter 2007
17. Small worlds Scale-free networks
Power law distributions Centrality
4(No Transcript)
5(No Transcript)
6Krebs 2004
7(No Transcript)
8Interleukin-2 receptor pathway protein
interaction network (from HPRD).
Peri et al., Nucleic Acids Res. 2004 January 1
32(Database issue) D497D501. doi
10.1093/nar/gkh070.
9American Journal of Sociology, Vol. 100, No. 1.
"Chains of affection The structure of adolescent
romantic and sexual networks, Bearman PS, Moody
J, Stovel K.
10The New York Times May 21, 2005
11Email network
12Networks
- The Web
- Citation networks
- Social networks
- Protein interaction networks
- Technological networks
- Other networks
- Movie actor networks
- Cooccurrence of characters in Les Miserables
- Board membership
13Types of networks
- Directed/undirected
- Can have weights
- Single-mode vs. bipartite (e.g., movie-actor
graphs)
14Semantic network
15Dependency network
bought
Meredith
yesterday
apples
green
16Dependency network
17Random network
18Lexical networks
- A special case of networks where nodes are words
or documents and edges link semantically related
nodes - Other examples
- Words used in dictionary definitions
- Names of people mentioned in the same story
- Words that translate to the same word
19Analyzing networks
- Clustering coefficient
- Watts/Strogatz cc triangles/triples
- Example
- Diameter (longest shortest path)
- Average shortest path (asp)
- Strongly connected component (SCC)
- Weakly connected component (WCC)
20Degree distribution
- Uniform
- Poisson
- Power-law (with coefficient a).
21Types of networks
- Regular networks
- Uniform degree distribution
- Random networks
- Memoryless
- Poisson degree distribution
- Characteristic value
- Low clustering coefficient
- Large asp
- Small world networks
- High transitivity
- Presence of hubs (memory)
- High clustering coefficient
- (e.g., 1000 times higher than random)
- Small asp
- Some are scale free
- Immune to random attacks
- (Very) vulnerable to targeted attacks
- Power law degree distribution
- (typical value of a between 2 and 3)
22From Mark Newman 2003. The structure and
function of complex networks
23Comparing the dependency graph to a random
(Poisson) graph
24Properties of lexical networks
- Entries in a thesaurusMotter et al. 2002
- c/c0 260 (n30,000)
- Co-occurrence networks Dorogovtsev and Mendes
2001, Sole and Ferrer i Cancho 2001 - c/c0 1,000 (n400,000)
- Mental lexicon Vitevitch 2005
- c/c0 278 (n19,340)
25(No Transcript)
26Graph-based representations
Square connectivity(incidence) matrix
Graph G (V,E)
27Bipartite graphs and one-mode projections
A
B
C
D
E
1
2
3
4
28Power laws
- Web site size (Huberman and Adamic 1999)
- Power-law connectivity (Barabasi and Albert
1999) exponents 2.45 for out-degree and 2.1 for
the in-degree - Others call graphs among telephone carriers,
citation networks (Redner 1998), e.g., Erdos,
collaboration graph of actors, metabolic pathways
(Jeong et al. 2000), protein networks (Maslov and
Sneppen 2002). All values of gamma are around 2-3.
29Small-world networks
- Diameter average length of the shortest path
between all pairs of nodes. Example - Milgram experiment (1967)
- Kansas/Omaha --gt Boston (42/160 letters)
- diameter 6
- Albert et al. 1999 average distance between two
verstices is d 0.35 2.06 log10n. For n 109,
d18.89. - Six degrees of separation
30Clustering coefficient
- Cliquishness (c) between the kv (kv 1)/2 pairs
of neighbors. - Examples