Clustering of Interaction Network - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering of Interaction Network

Description:

Title: Data Mining Approaches to Genomic Data Analysis Last modified by: azhang Created Date: 3/22/2001 12:46:03 PM Document presentation format – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 27
Provided by: cseBuffal9
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: Clustering of Interaction Network


1
Clustering of Interaction Network
  • Definition
  • Process to detect densely connected sub-graphs
  • Determines protein complexes or functional
    modules
  • Difficulties
  • Noisy data (too many false positives or false
    negatives)
  • Cannot be solved by traditional clustering
    techniques
  • Difficult to define the pair-wise distance
    between proteins in the network.
  • Protein complexes may overlap.
  • Disparate sources of data
  • Different reliabilities
  • 1750
  • Small overlaps
  • lt17

2
Protein Interaction Network
  • Undirected, unweighted graph
  • Node represents protein, edge represents
    interaction
  • Example of Yeast protein interaction network
  • Importance
  • Provide a global view of cellular organizations
    and biological functions
  • Applicable to systematic approaches for
    functional knowledge discovery
  • Problem
  • Large scale
  • Complex connectivity

3
Structural Property
  • Small-world Phenomenon ( Watts Strogatz )
  • Appearance of networks in the middle of regular
    and random networks
  • Higher average clustering coefficient than
    expected by random chance
  • Significantly small average shortest path length
  • Scale-free Distribution ( Barabasi Albert )
  • Network growth by preferential attachment
  • Power law degree distribution a few high degree
    nodes, many low degree nodes
  • Clustering coefficient distribution independent
    to degree

Protein Interaction Database DIP MIPS
density 0.0015 0.0015
average clustering coefficient 0.2283 0.2878
average shortest path length 4.14 4.43
degree distribution (?) 1.77 1.64
4
Conventional Graph Clustering Approaches
  • Density-based Clustering
  • Finding densely connected sub-graphs ( e.g.
    Maximal clique algorithm )
  • Hierarchical Clustering
  • Top-down approach iteratively partitioning a
    graph
  • ( e.g. Minimum cut algorithm )
  • Bottom-up approach iteratively merging nodes
  • ( e.g. Node merging by common neighbors )
  • Problems
  • Computationally inefficient
  • Unable to detect overlapping clusters
  • Discard sparsely connected nodes

5
Functional Influence Model
  • Functional Flow
  • treat each protein of known functional annotation
    as a source of functional flow for that
    function
  • simulating the spread of this functional flow
    through the neighborhoods surrounding the sources
    with random walk.
  • functional score the amount of flow that the
    protein has received for that function

u
v
Func(a)
6
Functional Influence
  • Functional Influence based on Distance.
  • Weibull Distribution
  • Curve Fitting

d is the distance between two nodes
7
Functional Influence Model
  • Information Flow Simulation
  • Computation of functional influence infs(x) of s
    on x ? V based on Shortest Path
  • Input a weighted interaction network and a
    source node s
  • Output functional influence pattern of s
  • Measurements
  • PathRatio
  • PathRatio is the natural aging or losing of
    information propagation in the network.
  • SPath(s,y) is all the shortest paths
    between node s and node y.
  • PR(s,y) is the PathRatio between node s and node
    y.
  • PathStrength
  • PS(P) measures the strength of path P
    using weights on the edges along the path P.

8
Framework of functional influence simulation
  • Algorithm
  • Initialize inf(s)
  • Compute initial flow I(s ? y) by
  • Update inf(y) by
  • Repeat 3 for every node in the network.
  • Finally, the functional profile,
  • is generated for every node in the network.

F(d) is the functional distribution model. d is
the distance between node s and node y. PR(s,y)
is the Path Resistance between node s and node y.
Inf(s) is the initial functional influence from
node s. Infs(y) is the functional influence
received by node y from node s.
9
Functional Module Detection (FMD)
10
FlowChart for functional module detection
11
Functional Modularity Detection
  • Experimental Data
  • DIP (4935 proteins, 14162 interaction)
  • Evaluation
  • Functional categories and annotations from MIPS
  • Hyper-geometric p-value
  • Result

12
Computational Epidemiology
  • Computational Epidemiology
  • is a multidisciplinary field utilizing
    techniques to develop tools and models to aid
    epidemiologists in their study of the spread of
    diseases.

1. Developing a virus spread and containment
respond model
4. Analyzing results of the containment strategy
(death toll vs. strategies)
2. Understanding virus spread and identifying
critical properties
3. Utilizing this finding into real infectious
virus spread
13
Virus Spread Network Model
  • What represent nodes and edges in virus spread
    network model?
  • Node
  • Person (community network)
  • Town or place (road network)
  • Edge
  • Interaction (community network)
  • Pathway (road network)
  • Weight of nodes and edges
  • Changed by time t based on virus spread dynamics
    model
  • Node weight Status of health (0 1)
  • Edge weight Status of strength (0 1)

14
Model Scheme
  • Spread Model
  • Spreading phase edges which are in the region of
    spreading will be damaged
  • Defense Model
  • Signaling and propagation phase nodes which have
    a certain number of damaged edges will send
    signals to neighbor nodes
  • Defense action phase nodes which have a certain
    level of signals from neighbor nodes will remove
    all edges of those nodes

Virus progression to neighbor nodes
Signaling alarms to neighbor nodes from infected
neighbor node
Culling nodes to prevent from virus progression
15
Spread Model
  • Spreading Model
  • Simulating disease spreading
  • Damaging nodes and edges which are in a virus
    spread radius from center
  • Virus Spread by r(t)

16
Defense Model
  • Defense Model
  • Simulating defense system of disease spreading
    and message spreading
  • Culling interactions from damaged nodes in order
    to stop spreading (Edge Culling in Green Circles)

17
Problem / Solution Approach
  • Which element of virus spread system has the
    greatest impact on containment campaign?
  • Identifying critical element of system by
    computational modeling and stochastic simulation.
  • How to plan a effective containment campaign for
    minimizing damages by virus spread?
  • Mining best combination of critical parameters
    under certain conditions.

Parameters
Critical parameter
Simulation Analysis
18
Application
  • Virus Spread Simulation on the road network at
    the city of Oldenburg, German
  • Green edges Healthy edges
  • Red edges Damaged edges by spread process
  • Blue edges Damaged edges by defense process

Uncontrolled ? 0.02
Intermediate ? 0.12
Controlled ? 0.22


19
Osteoporosis
  • Osteoporosis
  • Definition a systemic skeletal disease
    characterized by low bone mass and
    micro-architectural deterioration of bone tissue
    leading to enhanced bone fragility and a
    consequent increase in fracture risk
  • 25 million people in the United States are
    suffered.
  • 10 billion dollars are expended by medical
    charges including rehabilitation and treatment
    facilities.
  • Research Funding will be 200 billion by the year
    of 2040

Normal
Osteoporosis
20
Challenges
  • Diagnosis of Osteoporosis?
  • Traditional method of evaluating bone strength is
    by assessing bone mineral density (BMD).
  • Limitations on BMD
  • A major limitation of BMD is that it incompletely
    reflects variation in bone strength.
  • Other factors like bone microarchitecture
    contribute substantially to bone strength
  • By evaluating bone microstructure we can improve
    determination of bone quality and strength

Computational Model on Bone Microstructure
21
Computational Model on Bone Microstructure
  • Questions
  • What is the better way to evaluate bone strength?
  • How can we identify fragile locations of the bone
    structure?
  • Why dont we think this problem in a new
    direction?
  • Let me think this problem with the structural
    point of view.
  • Graph-based approach of bone microstructure
  • Bone microstructure contributes on bone strength.
  • We suppose rod-like mineral fibers represented by
    edges in a graph.
  • It is capable of quantitative
  • assessment of bone mineral
  • density and bone micro-architecture

22
Model Approach
  • Bone is not a uniformly solid material, but
    rather has some spaces between its hard elements.
  • Designing a network approach model for the bone
    microstructure.
  • Quantitative assessment of bone mineral density
    could be successfully done with this approach.

23
Bone Network Model
  • Creating Bone Network
  • A femur bone image from patients with
    osteoporosis by DXA scan.
  • By image profiling on DXA scan image, we create
    bone network based on the bone density.
  • What represent nodes and edges in bone network
    model?
  • Node fiber binding point for bone cell movements
    and biochemical interactions
  • Edge a group of mineralized fibers
  • Weight of nodes and edges
  • Node weight average weight of directly connected
    edges
  • Edge weight Strength status of mineralized fibers

24
Problem / Solution Approach
  • What alternative ways for determining the
    strength of bone rather than Bone Mineral Density
    (BMD)?
  • ?Designing a computational model of bone
    microstructure.
  • How can we identify fragile locations of the bone
    structure?
  • ?Creating algorithms for mining weak locations
    from a computational model of bone microstructure.

Human Bone
Bone Model
25
Identifying Critical Locations
  • Information Propagation Model
  • An algorithm to find critical edges in bone
    network
  • Measuring the quantity of stress energy in each
    edge
  • Cutting the most critical edge by Information
    Propagation Model
  • Iteratively run to find the next critical edges.
  • It stops at the first isolated network

26
Conclusions
  • Various applications are generating data very
    rapidly and in great volume, demanding data
    mining approaches.
  • Network-based approaches look promising to solve
    complex problems.
  • This research requires close collaboration among
    multidisciplinary groups.
  • Semi-supervised approaches to integrate domain
    knowledge into data mining tools are important to
    the success of the research.
Write a Comment
User Comments (0)
About PowerShow.com