Title: Clustering of Interaction Network
1Clustering of Interaction Network
- Definition
- Process to detect densely connected sub-graphs
- Determines protein complexes or functional
modules - Difficulties
- Noisy data (too many false positives or false
negatives) - Cannot be solved by traditional clustering
techniques - Difficult to define the pair-wise distance
between proteins in the network. - Protein complexes may overlap.
- Disparate sources of data
- Different reliabilities
- 1750
- Small overlaps
- lt17
2Protein Interaction Network
- Undirected, unweighted graph
- Node represents protein, edge represents
interaction
- Example of Yeast protein interaction network
- Importance
- Provide a global view of cellular organizations
and biological functions - Applicable to systematic approaches for
functional knowledge discovery - Problem
- Large scale
- Complex connectivity
3Structural Property
- Small-world Phenomenon ( Watts Strogatz )
- Appearance of networks in the middle of regular
and random networks - Higher average clustering coefficient than
expected by random chance - Significantly small average shortest path length
- Scale-free Distribution ( Barabasi Albert )
- Network growth by preferential attachment
- Power law degree distribution a few high degree
nodes, many low degree nodes - Clustering coefficient distribution independent
to degree
Protein Interaction Database DIP MIPS
density 0.0015 0.0015
average clustering coefficient 0.2283 0.2878
average shortest path length 4.14 4.43
degree distribution (?) 1.77 1.64
4Conventional Graph Clustering Approaches
- Density-based Clustering
- Finding densely connected sub-graphs ( e.g.
Maximal clique algorithm ) - Hierarchical Clustering
- Top-down approach iteratively partitioning a
graph - ( e.g. Minimum cut algorithm )
- Bottom-up approach iteratively merging nodes
- ( e.g. Node merging by common neighbors )
- Problems
- Computationally inefficient
- Unable to detect overlapping clusters
- Discard sparsely connected nodes
5Functional Influence Model
- Functional Flow
- treat each protein of known functional annotation
as a source of functional flow for that
function - simulating the spread of this functional flow
through the neighborhoods surrounding the sources
with random walk. - functional score the amount of flow that the
protein has received for that function
u
v
Func(a)
6Functional Influence
- Functional Influence based on Distance.
d is the distance between two nodes
7 Functional Influence Model
- Information Flow Simulation
- Computation of functional influence infs(x) of s
on x ? V based on Shortest Path - Input a weighted interaction network and a
source node s - Output functional influence pattern of s
- Measurements
- PathRatio
- PathRatio is the natural aging or losing of
information propagation in the network. - SPath(s,y) is all the shortest paths
between node s and node y. - PR(s,y) is the PathRatio between node s and node
y. - PathStrength
-
-
- PS(P) measures the strength of path P
using weights on the edges along the path P.
8Framework of functional influence simulation
- Algorithm
- Initialize inf(s)
- Compute initial flow I(s ? y) by
- Update inf(y) by
- Repeat 3 for every node in the network.
- Finally, the functional profile,
- is generated for every node in the network.
F(d) is the functional distribution model. d is
the distance between node s and node y. PR(s,y)
is the Path Resistance between node s and node y.
Inf(s) is the initial functional influence from
node s. Infs(y) is the functional influence
received by node y from node s.
9Functional Module Detection (FMD)
10FlowChart for functional module detection
11Functional Modularity Detection
- Experimental Data
- DIP (4935 proteins, 14162 interaction)
- Evaluation
- Functional categories and annotations from MIPS
- Hyper-geometric p-value
- Result
12Computational Epidemiology
- Computational Epidemiology
- is a multidisciplinary field utilizing
techniques to develop tools and models to aid
epidemiologists in their study of the spread of
diseases.
1. Developing a virus spread and containment
respond model
4. Analyzing results of the containment strategy
(death toll vs. strategies)
2. Understanding virus spread and identifying
critical properties
3. Utilizing this finding into real infectious
virus spread
13Virus Spread Network Model
- What represent nodes and edges in virus spread
network model? - Node
- Person (community network)
- Town or place (road network)
- Edge
- Interaction (community network)
- Pathway (road network)
- Weight of nodes and edges
- Changed by time t based on virus spread dynamics
model - Node weight Status of health (0 1)
- Edge weight Status of strength (0 1)
14Model Scheme
- Spread Model
- Spreading phase edges which are in the region of
spreading will be damaged - Defense Model
- Signaling and propagation phase nodes which have
a certain number of damaged edges will send
signals to neighbor nodes - Defense action phase nodes which have a certain
level of signals from neighbor nodes will remove
all edges of those nodes
Virus progression to neighbor nodes
Signaling alarms to neighbor nodes from infected
neighbor node
Culling nodes to prevent from virus progression
15Spread Model
- Spreading Model
- Simulating disease spreading
- Damaging nodes and edges which are in a virus
spread radius from center - Virus Spread by r(t)
16Defense Model
- Defense Model
- Simulating defense system of disease spreading
and message spreading - Culling interactions from damaged nodes in order
to stop spreading (Edge Culling in Green Circles)
17Problem / Solution Approach
- Which element of virus spread system has the
greatest impact on containment campaign? - Identifying critical element of system by
computational modeling and stochastic simulation. - How to plan a effective containment campaign for
minimizing damages by virus spread? - Mining best combination of critical parameters
under certain conditions.
Parameters
Critical parameter
Simulation Analysis
18Application
- Virus Spread Simulation on the road network at
the city of Oldenburg, German - Green edges Healthy edges
- Red edges Damaged edges by spread process
- Blue edges Damaged edges by defense process
Uncontrolled ? 0.02
Intermediate ? 0.12
Controlled ? 0.22
19Osteoporosis
- Osteoporosis
- Definition a systemic skeletal disease
characterized by low bone mass and
micro-architectural deterioration of bone tissue
leading to enhanced bone fragility and a
consequent increase in fracture risk - 25 million people in the United States are
suffered. - 10 billion dollars are expended by medical
charges including rehabilitation and treatment
facilities. - Research Funding will be 200 billion by the year
of 2040
Normal
Osteoporosis
20Challenges
- Diagnosis of Osteoporosis?
- Traditional method of evaluating bone strength is
by assessing bone mineral density (BMD). - Limitations on BMD
- A major limitation of BMD is that it incompletely
reflects variation in bone strength. - Other factors like bone microarchitecture
contribute substantially to bone strength - By evaluating bone microstructure we can improve
determination of bone quality and strength
Computational Model on Bone Microstructure
21Computational Model on Bone Microstructure
- Questions
- What is the better way to evaluate bone strength?
- How can we identify fragile locations of the bone
structure? - Why dont we think this problem in a new
direction? - Let me think this problem with the structural
point of view. - Graph-based approach of bone microstructure
- Bone microstructure contributes on bone strength.
- We suppose rod-like mineral fibers represented by
edges in a graph. - It is capable of quantitative
- assessment of bone mineral
- density and bone micro-architecture
22Model Approach
- Bone is not a uniformly solid material, but
rather has some spaces between its hard elements. - Designing a network approach model for the bone
microstructure. - Quantitative assessment of bone mineral density
could be successfully done with this approach.
23Bone Network Model
- Creating Bone Network
- A femur bone image from patients with
osteoporosis by DXA scan. - By image profiling on DXA scan image, we create
bone network based on the bone density. - What represent nodes and edges in bone network
model? - Node fiber binding point for bone cell movements
and biochemical interactions - Edge a group of mineralized fibers
- Weight of nodes and edges
- Node weight average weight of directly connected
edges - Edge weight Strength status of mineralized fibers
24Problem / Solution Approach
- What alternative ways for determining the
strength of bone rather than Bone Mineral Density
(BMD)? - ?Designing a computational model of bone
microstructure. - How can we identify fragile locations of the bone
structure? - ?Creating algorithms for mining weak locations
from a computational model of bone microstructure.
Human Bone
Bone Model
25Identifying Critical Locations
- Information Propagation Model
- An algorithm to find critical edges in bone
network - Measuring the quantity of stress energy in each
edge - Cutting the most critical edge by Information
Propagation Model - Iteratively run to find the next critical edges.
- It stops at the first isolated network
26Conclusions
- Various applications are generating data very
rapidly and in great volume, demanding data
mining approaches. - Network-based approaches look promising to solve
complex problems. - This research requires close collaboration among
multidisciplinary groups. - Semi-supervised approaches to integrate domain
knowledge into data mining tools are important to
the success of the research.