Title: Evolving Networks
1Evolving Networks
- Jean-Loup Guillaume
- NPA team / LIP6 / UPMC - France
- Joint works with
- E. Fleury, C. Robardet, A. Scherrer,
- M. Latapy and S. Le Blond.
2Outline
- Typical evolving networks/applications
- peer to peer networks
- web graphs
- internet networks
- phone calls networks
- Measurement issues
- Summary of classical approaches
- A case study proximity sensor network
- degrees and their evolution
- evolution of components and social groups.
3Peer-to-peer networks
- Systems used to share files or computer
resources, - to connect peoples (skype?)...
- A typical P2P system/protocol allows
- addition of new clients/files in a simple way
- effective search of contents in the network
- resilience clients can leave unexpectedly
- users should not be overloaded.
- Two main approaches
- centralised a central server records all files
and answers all queries - distributed peers are organised in an overlay
network and are cooperatively in charge of all
operations.
4Peer-to-peer networks (cont.)
- Typical figures on a Edonkey server for 48h
- around 50 000 clients connected (small server)
- 1,5 millions of connections/disconnections
- 210 millions of sources search (who share file
X). - P2P networks viewed as graphs
- links between peers in an overlay network
- efficient search of files, resilience to
departure... - in general not related to geography/peers
interests. - files exchange
- communities of users with similar interests
- links between files
- communities of files - recommendation systems...
5Peer-to-peer networks (cont.)
- Degree-degree correlations
- Files exchange (oriented network)
- Out-degree how many files you are looking for.
- In-degree how many files you are sharing.
- Global time aggregation
6 Peer-to-peer networks (cont.)
- Degree evolution
- Most popular clients
- number of queries received linear in time
(repeated automatically by software) - number of queries received for different files is
converging really fast few popular files.
7Web graphs
- Hyperlinks between web pages.
- Size and evolution
- billions of web pages
- pages/links can be created/modified/removed
- million of modifications every day.
- lot of dynamic content.
- Search engines (Google)
- use text-mining and link analysis to rank pages
- a page is good if linked to by many good pages.
- need to know the structure of the network
- fast evolving pages might be more relevant?
8Web graphs (cont.)?
- Ranking pages (pagerank)
- need a good/up to date knowledge of the network
- avoid file not found.
- visit web pages often enough, but not too often
- need to know if some pages/portions of the web
are evolving faster or slower - web pages classification?
- Web spam detection text-mining
- detect specific subnetworks (cliques...) of fake
pages used to increase artificially the rank - can use the dynamics to detect substructures with
a non-natural rate of apparition - a 10-clique can exists but if it appears in one
day!
9Internet network
- Set of machines/routers with physical links
- Size and evolution
- millions of machines (billions as soon as gsm,
cars, fridges, etc. gets equipped with wireless) - evolution is slow, but routing is changing very
often (failures, congestion, load balance). - Security issues
- given a set of routing tables/networks, can you
detect outliers? - on-line detection of attacks (DDOS for instance)
through the observation of routing evolution?
10Phone calls networks
- Who calls who
- typically millions of customer for a company A
- few calls/sms/mms per day for each user (from
company A to A, A to B, B to A. Calls from B to C
are unknown) - use of non network services (internet, logo or
music downloading, etc.) with mobiles - information is kept for billing issues
- customer information can also be used for
marketing issues segmentation... - Main objectives
- keep customers and get new ones
- create new services and sell them to their
clients.
11Typical evolution
- Aggregation per day
- number of calls and sms per day
- sociological effects week-end plus specific days
(Christmas, new year, Valentines day, etc) - calls and sms are complementary.
12Phone calls networks (cont.)
- Churn prediction (3 types of churn)
- more than 20 of churn every year
- Sociological network approaches
- strong correlations for operator, geographical
distance, age and even handset brand - evolution of calls patterns
- use of data-mining/feature selection approaches.
- Acquire new customers
- every time a client x from company A calls or is
called by a client y from company B, company A
gets some knowledge about y - A can offer specific prices to y if A thinks that
y is willing to churn.
13Phone calls networks (cont.)
- Diffusion of innovation/viral marketing
- Use word of mouth
- give specific offer to one person and encourage
him to talk about it to his friends (which might
get the same offer) - some services are more likely to be diffused
(person to person services, e.g. sms, mail,
etc.)? - Can be observed on phone calls networks
- Many sms/mms sent are only response to previous
sms/mms. - Diffusion effect clearly visible.
- Live experiments in many countries based on
graph/data mining approaches - good preliminary results (response rate much
greater than expected).
14Outline
- Typical evolving networks/applications
- peer to peer networks
- web graphs
- internet networks
- phone calls networks
- Measurement issues
- Summary of classical approaches
- A case study proximity sensor network
- degrees and their evolution
- evolution of components and social groups.
15Measurement issues
- Some networks are simply log files,
- many are obtained through measurements
- Measuring evolution is hard in general
- Example of Web graphs
- one machine with good Internet connexion can
capture few millions web pages every day - a number of pages of the same order have been
modified or created during this day. - Two solutions
- study long scale evolution or small subnetworks
- (ask or be Google).
16Measurement issues (cont.)
- Data quality is always an issue.
- Reliability
- Who made the measurement?
- What proportion have been measured, how long did
it take and what is the evolution rate? - Are there constraints which might bias the
result? - Technological, biological,
- e.g. if a web page is not linked to, it cannot be
found following links. - Can it be reproduced?
17Measurement issues (cont.)
- Approximation of the quality of Internet maps
- number of sources/destinations
- for many parameters.
- In general, studied networks are incomplete and
biased by the measurement process - work on bias removal and,
- work on the biased data.
18Outline
- Typical evolving networks/applications
- peer to peer networks
- web graphs
- internet networks
- phone calls networks
- Measurement issues
- Summary of classical approaches
- A case study proximity sensor network
- degrees and their evolution
- evolution of components and social groups.
19Aggregation
- Consider the agglomerated graph rather than its
evolution - often used when the measurement process is too
long or too hard to get many snapshots. - many parameters are available to describe a
static network - centrality of nodes (degree, betweenness, etc.)
- communities or typical subnetworks
- correlations between properties
- etc.
- Aggregation can be done on a smaller scale
- minute, day, month
- depends on the phenomenon under study.
20Evolution of static properties
- Consider a static property and plot its evolution
through time - number of nodes, links, etc.
- tools from signal processing can be used
- long range dependence (process with memory)
- detection of anomalies
21Study specific users/phenomenon
- Outliers, central individuals, bridges...
- churn prediction
- viral marketing or diffusion over the network
22Define new properties
- Properties which capture the evolution (but
cannot be defined on a static network) - time connected components set of users which can
reach other others using the evolution of the
network. - ?
23Drawing as a tool
- A good drawing algorithm might
- give a good view of the overall structure
- enlighten specific parts in the network
(anomalies). - Specific parts can be studied later on.
- Some simple approaches
- draw the fully-aggregated network and for each
time (or aggregation) step only draw the
corresponding subnetwork - display through time a matrix representing the
network (adjacency/weight/etc.) - display a (time x space) matrix.
24(No Transcript)
25(No Transcript)
26image courtesy of C. de Kerchove
27Outline
- Typical evolving networks/applications
- peer to peer networks
- web graphs
- internet networks
- phone calls networks
- Measurement issues
- Summary of classical approaches
- A case study proximity sensor network
- degrees and their evolution
- evolution of components and social groups.
28Context
- Many devices with wireless capabilities
- computers, mobile phones, PDA
- ambient wireless network
- pair wise contacts, intermittent connectivity.
- Nodes spread around, mobile
- data is transmitted in a multihop fashion
29Current technology
30Goals
- Goal of a network transmit information
- proximity is important (radio medium)
- Performance / reliability / connectivity
- rely on the underlying network and the mobility
- need to better understand the evolution and
prepare for scalability issues.
31Mobility
- Only consider geographical proximity
- need to know movements
- proximity can be deduced from movement and
initial positions. - How are people moving
- randomly, group movement, something else?
- How to measure it
- geolocalisation (GPS)
- expensive every person/machine must be equipped.
- using gsm approximate position.
- Exact position is not important, proximity is
- use proximity sensors.
32Constraints
- Geographical proximity imposes constraints
- many contacts gt some contacts must be near
- anything else?
- This is not going to be considered here.
33Available (not so massive) data
A. Chaintreau et al., WDTN 2005
- Infocom 2005 conference
- 54 sensors (11 out of order, 2 lost) 41
(small) - 3 days (short)
- very specific situation.
- Bluetooth sensors
- Seeking for contacts (5s)
- Wait answers (108-132s).
- Data
- For each time step (0-250000), a set of links
- All links have been symmetrised.
- Few other similar datasets are available.
34(No Transcript)
35Evolution of the network
- Sociological effects
- Day/night/breaks
- Lots of small variations. 50 of isolated nodes
(day)? - Maximum of 34 nodes connected (in 1 CC)?
36One day typical evolution
37Evolution of the network (cont)
- Positive correlation nodes / links
- For a given of nodes, there exists a large
number of possible configurations from sparse to
dense
38Random process - contacts
Contact duration a 1.4
Inter-contact duration a 0.41
- Straight line
- wide range 300, 20k 100, 30k
- power law distribution
39Random process - degrees
- Analyze the differential sequence
- DkSk1-Sk if Skoriginal data sequence
- Covariance / wavelet based tool
- No long-range dependence / similar to a random
process.
40Random process - degrees
- Covariance / wavelet based tool to obtain a
spectral log-log representation of the covariance
in the wavelet domain - j is the scale
- Sj is roughly the average of the wavelet coef. at
scale j - Power law ? long range dependency rather than
high variability - Estimated exponent ? Hurts exponent is close to
the special value 0.5 - ? no long range ? Independent Identically
Distributed (IID)?
Covariance of the differential sequence in the
wavelet domain
() P. Abry and D. Veitch, Wavelet analysis of
long-range dependent traffic, TIT, 1998
41Connected components
- At each time step network set of Ccs
- groups of people which can communicate.
- CC stability/structure
- stable gt possibility of long communications.
- too stable gt cannot communicate outside the CC.
- structure information on the number of hops, the
number of radio conflicts you might face...
42Connected components (cont.)
- For each time-step compute every CC
- 2 similar sets of nodes but different sets of
links are different CC (routing has to be
modified) - very different CCs are observed.
43CCs - Density
- Set of connected nodes and links
- Small components strong variation of density.
- Big components low density
- max(nb_links) 4.5nb_nodes
- Day one giant components and few very small
ones. - Night many small components (mainly isolated
links)?
44CCs - Stability
- Strong heterogeneity, most components
- Appear only a few times.
- Have a very short cumulated lifetime.
45CCs - Stability (cont.)
- Large components (given set of nodes/links)
- Have a very short cumulated lifetime (12nodes /
100sec max)? - Rarely appear more than once.
- Global dynamic effects impact on large CCs
- However
- Large components mainly encounter small
modifications.
46CCs - Isomorphisms
- Consider nodes and links or nodes only?
47CC - Isomorphisms (cont.)
- Isomorphic components over-represented?
- 1 1 1 3 (star) 17.9 (10.5)?
- 1 1 2 2 (cycle-1) 36.1 (31.6)?
- 1 2 2 3 (star1) 27.2 (31.6)?
- 2 2 2 2 (cycle) 2.7 (7.9)?
- 2 2 3 3 (cycle1) 12.4 (15.8)?
- 3 3 3 3 (clique) 3.8 (2.6)?
- In general
- Very high and low density over-represented.
48Data mining techniques
- Computation of set patterns using complete
solvers - D-miner.
- Evolution (edges x aggregated time) boolean
matrix - looking for maximal rectangles of true values
(formal concepts) - maximal frequent subgraphs using a time
threshold. - maximal significant subgraphs using a edge
threshold.
49Identifying social groups
- We obtain 23 316 frequent connected subgraphs
- 10 time step (10x240) and at least 5 edges
- Most of the subgraphs only cover few individuals
with a low edge density
50Identifying social groups (cont.)
- social group
- frequent and significant connected component
- Only 281 with a density greater than 0,8
- same sets of vertices are covered many time, or
differ of very few individuals or time steps. - Merging very similar social groups
- A subgraph is defined by set of edges E and
characterized by a set of time steps T - Two subgraphs (E1,T1) and (E2,T2) are such that
if E1 is included in E2 then T2 is included in T1 - As sub graphs are dense, we consider the vertices
they cover. We merge two sub graphs if V1?V2 and
T1\T2 contains time steps that differ to at most
one time step to a time step of T2. - 15 groups of vertices
51Trajectories among social groups
- Individual 19, enters group 13 (time step 1215)?
- Goes to group 9
- Before going to group 10
9
10
19
13
52Random evolving model
- Very simple model matching only the power laws
- On-off sequence for each link based on the real
distribution - Connected components
- Similar results for of CCs, lifetime, of
apparitions - Almost tree-like components (linear in the of
nodes)? - Limit size effect for large components.
53Evolving connexity
- Capture the evolution
- Simulation of a dynamic broadcast.
- For a given node u at time t
- T(u,t) flooding tree from u at time t
- u receive the information at time t
- u send the information asap to all its
neighbours. - Parameters
- broadcast source
- broadcast starting time
- immediate transmission or not.
54Fast diffusion
- Diffusion initiated at time t0
- Large component.
55Slow diffusion
- Diffusion initiated during the night (t115k)?
- End of night t138k
- Maximal number of links at 139k.
56Utility of nodes and links
- Tree structure
- Depth number of transmitters
- Width, ...
- Utility of a node or link
- Number of nodes in the sub-tree
- Defined from a given source at a given time
- Global utility.
57Conclusions
- Characterization of the evolution is crucial
- Components of individual are playing a key role
- Heterogeneous nature of CCs.
- How to describe the evolution of CCs?
- Can we work with other groups/communities?
- Real test bed with 200 nodes / 1 month period
- Data available soon at http//www.worldsens.net
58Future work
- Evolving networks
- Analyze the dynamics
- Introduce new parameters to describe it.
- Dynamical models
- Topology and/or mobility
- Intra-inter duration not sufficient.
- Community structure.
- To be used for protocols simulation.
59Future work (cont.)
- Local/Distributed evolving communities
- Mobile networks
- Detection of intra/inter community links
- Opportunistic routing outside communities
- Other strategies inside.
- Other parameters and related strategies.
- P2P networks
- Detect creation and evolution of communities
- Follow users inside communities.
60Future work (cont.)?
- Networks security
- Evolution of topology/routing/use
- Describe typical evolution.
- Measure/detect specific events
- attacks, failures, new use
- small perturbations gt strong impact.
- Robustness to attacks
- heterogeneity gt sensitivity to targeted attacks
- dynamical context relation with utility?