Biological Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Biological Networks

Description:

Biological Networks Graph Theory and Matrix Theory Ka-Lok Ng Department of Bioinformatics Asia University Content Topological Statistics of the protein ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 68
Provided by: Albe3154
Category:

less

Transcript and Presenter's Notes

Title: Biological Networks


1
Biological Networks Graph Theory and Matrix
Theory
  • Ka-Lok Ng
  • Department of Bioinformatics
  • Asia University

2
Content
  • Topological Statistics of the protein interaction
    networks
  • How to characterize a network ?
  • Graph theory, topological parameters (node
    degrees, average path length, clustering
    coefficient, and node degree correlation.)
  • Random graph, Scale-free network, Hierarchical
    network
  • Evolution of Biological Networks

3
Biological Networks - metabolic networks
  • Metabolism is the most basic network of
    biochemical reactions, which generate energy for
    driving various cell processes, and degrade and
    synthesize many different bio-molecules.

4
Biological Networks - Protein-protein
interaction network (PIN)
  • Proteins perform distinct and well-defined
    functions, but little is known about how
  • interactions among them are structured at the
    cellular level. Protein-protein interaction
    account for binding interactions and formation of
    protein complex.
  • - Experiment Yeast two-hybrid method, or
    co-immunoprecipitation

Limitation No subcellular location, and
temporal information.
Cliques protein complexes ?
www.utoronto.ca/boonelab/proteomics.htm
5
Biological Networks - PIN
  • Yeast Protein-protein interaction network
  • - protein-protein interactions are not random
  • - highly connected proteins are unlikely to
    interact with each other.

Not a random network
  • Data from the high-
  • throughput two-hybrid
  • experiment (T. Ito, et al.
  • PNAS (2001) )
  • The full set containing
  • 4549 interactions among
  • 3278 yeast proteins
  • 87 nodes in the largest
  • component
  • kmax 285 !
  • Figure shows nuclear
  • proteins only

6
Biological Networks Gene regulation networks
In a gene regulatory network, the protein encoded
by a gene can regulate the expression of other
genes, for instance, by activating or inhibiting
DNA transcription. These genes in turn produce
new regulatory proteins that control other genes.
Example of a genetic regulatory network of two
genes (a and b), each coding for a regulatory
protein (A and B).
7
Biological Networks Gene regulation networks
Transcription regulatory network in H.
sapiens Data courtesy of Ariadne Genomics
obtained from the literature search 1449
regulations among 689 proteins
Transcription regulatory network in E. coli Data
(courtesy of Uri Alon) was curated from the
Regulon database 606 interactions between 424
operons (by 116 TFs)
Transcription regulatory network in Yeast -
From the YPD database 1276 regulations among
682 proteins by 125 transcription factors (10
regulated genes per TF) - Part of a bigger
genetic regulatory network of 1772 regulations
among 908 proteins
8
Biological Networks Signal transduction
networks
Hormones (first message) Receptor cAMP, Ca
(second message) phosphorylation
  • Nuclear transcription factor NF-kB
  • control of apoptosis (cell suicide),
  • development of B and T cells,
  • anti-viral and bacterial responses

Oxidant-induced activation of NF-kB signal
transduction
9
Biological Networks Cell cycle regulation
networks
10
Biological Networks
Biological networks are not randomly
connected Underlying architecture ?
clustering How to characterize ? An universal
features across different species ?
11
Graph theory- The Bridge Obsession Problem
Find a tour crossing every bridge just once
Leonhard Euler (Switzerland), 1735. It turns out
it is impossible.
L. Euler 1707-1783
Bridges of Königsberg
12
Eulerian Cycle Problem
  • Find a cycle that visits every edge exactly once
  • Linear time

More complicated Königsberg
13
Hamiltonian Cycle Problem
  • Find a cycle that visits every vertex exactly
    once
  • Around the 20 famous cities in the world
  • NP complete

Sir W. Hamilton (English mathematician) 1805 -
1865
Game invented by Sir William Hamilton in 1857
14
Mapping Problems to Graphs
  • Arthur Cayley (English mathematician) studied
    chemical structures of hydrocarbons in the
    mid-1800s
  • He used trees (acyclic connected graphs) to
    enumerate structural isomers

Arthur Cayley
15
Real networks
  • Many networks show scale-free behavior
  • World-Wide Web
  • Internet
  • Ecology network (food web)
  • Science collaboration network
  • Movie actor collaboration network
  • Cellular network
  • Network in linguistic

Power law behavior
16
Graph Theory
Binary relation Pathway
Cluster Hierarchical Tree
A
B
Network representation. A network (graph)
consists of a set of elements (vertices) and a
set of binary relations (edges).
17
Graph Theory Basic concepts
Graphs G(N,E) Nn1 n2,... nN Ee1 e2,...
eM ekni nj Nodes proteins Edges protein
interactions Mutligraph ekni nj duplicate
edges i.e. emni nj Nodes proteins Edges
interactions of different sort? binding and
similarity
Hypergraphs Hyperedge exni, nj, nk ... Nodes
proteins Edges protein complexes Directed
hypergraph Hyperedge exni, nj .. nk nl
... Nodes substances Edges chemical reactions
A B ? C D eXA, B .. C, D ... Directed
graph ekni nj Nodes genes and their
products Edges from A to B gene regulation
? gene A regulates expression of gene B
Different systems ? Different graphs
18
Graph Theory Basic concepts
  • Node degree
  • Components
  • Complete graph (Clique)
  • Shortest path length

Clustering coefficient Ci if A-B, B-C, then it is
highly probable that A-C
Two ways to compute Ci -Ei actual connections out
of Ck2 possible connections -number of triangles
that included i/ki(ki-1)
Average clustering coefficient
19
Graph Theory Vertex adjacency matrix
Undirected graph
ki 1 3 1 1
symmetric
- 8 means not directly connected - node i
connectivity, ki countj(mij 1)
Bipartite graph
20
Graph Theory Edge adjacency matrix
a b c d
a b c d
c
The edge adjacency matrix (E) of a graph G is
identical to vertex adjacency matrix (A) of the
line graph of G, L(G). That is the edge in G are
replaced by vertices in L(G). Two vertices in
L(G) are connected whenever the corresponding
edges in G are adjacent.
The labeling of the same graph G are related by a
similarity transformation, P-1A(G1)PA(G2).
21
Graph Theory average network distance
  • Interaction path length or average network
    distance, d
  • the average of the distances between all pairs
    of nodes
  • frequency of the shortest interaction path
    length, f(j)
  • determined by using the Floyds algorithm
  • The average network diameter d is given by
  • where j is the shortest path length between two
    nodes.
  • Network diameter (global) ? Average network
    distance (local)

22
Graph Theory the shortest path
  • The shortest path
  • Floyd algorithm, an O(N3) algorithm. For
    iteration n,
  • given three nodes i, j and k, it is shorter to
    reach j from i by passing through k
  • MnijminMn-1ij, Mn-1ikMn-1kj
  • - search for all possible paths,
  • e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4

23
Graph Theory number of the shortest path in a
graph
  • A nonvanishing element of A(G), Aij 1,
    represents a walk of a length between the
    vertices i and j. Therefore, in general

if there is a walk of length one between vertices
i and j otherwise
There are walks of various lengths which can be
found in a given graph. Thus
if there is a walk of length two between vertices
i and j passing through the vertex k otherwise
Therefore, the expression
represents the total number of walks of
the length 2 in G between the vertices i and
j. For a walk of a length L, we
24
Graph Theory Trace of a matrix
  • Trace of the NxN matrix A
  • In the case of the adjacency matrix for graph
    without loops, Tr A 0
  • The trace of powers of A is a graph invariant

where M is the number of edges, C3 is the number
of three-membered cycles. In case of graph with
n loops
25
Random Graph Theory
Graph Theory Probability
26
Random Graph Theory
Graph Theory Probability
27
Random Graph Theory Graph Theory Probability
  • Random graph (Erdos and Renyi, 1960)
  • N nodes labeled and connected by n edges
  • CN2 N(N-1)/2 possible edges
  • possible graphs with N nodes and n edges

N 4 ? C6n
n Number of possible graphs, C6n
1 6
2 15
3 20
4 15
5 6
6 1
N 4
n 3 3 4 4
5 6
28
Random Graph Theory Random network, Scale free
network
  • Connectivity distribution P(k)
  • In a random network, the links are randomly
    connected and most of the nodes have degrees
    close to ltkgt2E/N. The degree distribution P(k)
    vs. k is a Poission distribution,
  • i.e. P(k) ltkgtke-ltkgt/k!
  • In many real life networks, the degree
    distribution has no well-defined peak but has a
    power-law distribution, P(k) k-g ,where g is a
    constant. Such networks are known as scale-free
    network.
  • Random network? LogP(k) vs Logk plot has a
    peak
  • ? homogenous nodes
  • ? d log N
  • Scale-free network ? LogP(k) vs Logk plot is
    a line with negative slope
  • ? inhomogenous
    nodes
  • ? d log(log N)
  • Albert R. and Barabasi A.L.(2002) Rev. Mod. Phys.
    74, 47

Random network
Scale-free network
http//physicsweb.org/box/world/
29
Example metabolic pathways
  • WIT database (43 organisms), node
    substrated, edge reaction
  • scale-free network ? P(k)ltk-g, with gin 2.2,
    gout 2.2
  • similar scaling behavior of connectivity
    distribution
  • Fig. 2d, connectivity distribution averaged over
    43 organisms
  • Suggested that metabolic networks belong to the
    class of scale-
  • free networks

http//ergo.integratedgenomics.com/IGwit/
It is interesting to notice that most of the real
networks have 1 lt g lt 3.
30
Random Graph, Scale-free network, Hierarchical
network
  • Hierarchical network -
  • coexistence of
  • modularity,
  • local clustering, and
  • scale-free behavior

Node degree distribution
  • scaling Cave(k) k-b
  • 1 for Deterministic
  • hierarchical network
  • model

Clustering coefficient
31
Graph Theory Network motifs
  • Compared the abundance of small loops in E. coli
    transcription regulatory network to its
    randomized counterpart
  • Treat the transcription network as directed
    graph
  • ? node operon (a group of contiguous genes)
  • ? edge from an operon that encode an TF to
    an operon regulated by that TF
  • Frequency of occurrences three types of motifs
    (feed-forward loops, single input module, and
    dense overlapping regulons) are much higher than
    the random network version
  • There are 13 types of 3-node connected, directed
    subgraphs
  • Feed-Forward Loops (FFL) were significantly
    over-represented (40 in real vs 7/- 5 in random)

Reference S.S. Shen-Orr, R. Milo, S. Mangan,
and U. Alon, Nature Genetics, 31(1)64-68 (2002)
32
Graph Theory Network motifs
  • Feed-forward loop
  • A TF X regulate a second TF Y, and both jointly
    regulated
  • one or more operons Z1,.. Zn.
  • Single input module (SIM)
  • A single TF X regulates a set of operons Z1,..
    Zn. X is
  • usually autoregulate
  • Dense overlapping regulons (DOR)
  • A set of operons Z1,.. Zm are each regulated by
    a
  • combination of a set of TFs, X1,.. Xn.

33
Graph Theory Node degree correlation
  • random graph models ? node degrees are
    uncorrelated
  • count the frequency P(K1,K2) that two proteins
    with connectivity K1 and K2 connected to each
    other by a link
  • compared it to the same quantity PR(K1,K2)
    measured in a randomized version of the same
    network.
  • The average node connectivity for a fixed K1 is
    given by,.
  • where ltgt denotes the multiple sampling average,
    and the summation sums for all K2 with a fixed
    K1.
  • - In the randomized version, the node degrees of
    each protein are kept the same as in the original
    network, whereas their linking partner is totally
    random.

34
Input - Database of Interacting Proteins (DIP)
  • DIP http//dip.doe-mbi.ucla.edu
  • DIP is a database that documents experimentally
    determined protein-protein interactions. We
    analyze the protein-protein interaction for seven
    different species, C.elegan, D. melanogaster, E.
    coli, H. pylori, H. sapiens, M. musculus and S.
    cerevisiae.

- Look for general and different features of PIN
for different species
35
Input - Database of Interacting Proteins (DIP)
36
Results scale-free network study
  • Large standard deviation of k
  • Coefficient of determination, r2 SSR/SST gt0.90
  • To account for the flat plateau and long tail
    behaviors, assume a short-length scale correction
    k0 and an exponential cut-off tail at kc

yeast g 2.1 fly g 1.9
37
Results
Highly connected proteins (k?25) yeast (39
sequences) and fly (317 sequences) Most of the
sequences do not have high sequence similarity
(E-value ? 0.01) ? different functions
38
Results
These highly connected proteins are
pair-wise compared in an all-against-all manner
using gapped BLAST (16), and none of the
sequences shown significant sequences similarity
(E-value lt 0.001) except the tryptophan protein
and SEC27 protein, nuclear pore protein, 26S
proteasome regulatory particle chain and
DNA-directed RNA polymerase.
39
Results
Fig. 4. The logarithm of the normalized
frequency distribution of connected paths vs the
logarithm of their length for S.
cerevisiae(CORE), H. pylori, E. coil, H. sapiens,
M. musculus and D. melanogaster.
40
Results node degrees correlation
2.0
Highly connected proteins are unlikely to
interact.
2.0
2.0
2.0
1.0
2.0
2.0
2.0
2.0
2.0
1.0
2.0
1.0
1.6
41
Results Hierarchical structures

Yeast
E. coli
Cave(k) k-b

The plots of Log Cave(k) vs Log k for the seven
species. All the species exhibit a rather flat
plateau for small values of k, and they fall
rapidly for large k.
42
Results identification of cliques
identify protein complexes compute the
clustering coefficients, find the cliques or
pseudo-cliques
43
Identification of cliques
  • Theorem
  • Let A3ij be the (i,j)-th element of A3. Then a
    vertex Pi belongs to some clique if and only if
    A3ii ?0.
  • Example

and
The non-zero diagonal entries of A3 are a311,
a322 and a344. Consequently, node 1, 2 and 4
belong to cliques. Since a clique must contain at
least three vertices, the graph has only one
clique.
44
Results - protein complexes
Identification of the highest clique degree with
protein complexes We had identified all possible
cliques within the seven PINs. To identify the
relation between cliques and protein complexes,
we only considered cliques with the largest
number of connected proteins in our preliminary
study, and had succeeded in predicting some of
the cliques did correspond to protein complexes
(comparing data from the BIND database).
45
Evolution of Biological Networks
  • Databases DIP and MIPS
  • Motif identification
  • - detecting all n-node subgraphs,
  • i.e. all 2-, 3-, 4- and some 5-
  • node (a set of 28 five-node
  • motifs) motifs in yeast PIN
  • the network consists of 3183
  • yeast proteins encodes 1000 to
  • 1,000,000 copies of the specific
  • motif types

46
Evolution of Biological Networks
  • studied the conservation of 678 (47 of 1443)
  • yeast proteins with an ortholog in each of
  • five higher eukaryotes (A. thaliana, C.elegans,
  • D. melanlgaster, M. musculus and H.sapiens)
  • deposited in the InParanoid database
  • 47 of the 1443 fully connected
  • pentagons (11), in yeast have each of their
  • five proteins components conserved in each
  • of the five higher eukaryotes
  • - this results ? blocks of cohesive motifs
  • tend to be evolutionary conserved

47
Evolution of Biological Networks
Growth Model of a scale-free network PIN - New
proteins nodes are added (genes duplication) -
Preferential attachment
Redundant links are lost (in an asymmetric
fashion)
48
Evolution of Biological Networks
Growth 1. start with m0 nodes 2. add a node with
m edges 3. connect these edges to existing
nodes at time step t tm0 nodes, tm
edges Preferential attachment Probability q of
connection to node i depends on the degree ki of
this node.
m03, m2
This model leads to the power law distribution
P(k) 2m2k-3 k-3
49
Summary
  • Protein-protein interaction Network
  • PINs are not random networks, they have rather
    heterogeneous structures ? highly connected
    protein ? blastp shows that they do not share
    sequence similarity
  • The plots of LogPcum(k) vs Logk study
    indicates that PINs are well approximate by
    scale-free networks
  • a 2 ? A general biological evolution mechanism
    across species ? growth preferential attachment
    model
  • The plots of LogPcum(k) vs Logk for fly and
    yeast seems to have deviation at the small k and
    large k value ? modification of the growth
    preferential attachment model
  • Highly connected proteins are unlikely to
    interact
  • Hierarchical network model is a better
    description for certain species PINs

50
Matrix
  • Permutations
  • A one-to-one mapping of the set 1,2,3,n onto
    itself is called a permutation. We denote the
    permutation s by s j1j2jn.
  • The number of possible permutation is n!, and
    that the set of them is denoted by Sn. For
    example, S2 12, 21, S3 123, 132, 213, 231,
    312, 321.
  • Consider an arbitrary permutation s in Sn s
    j1j2jn. We say s is even or odd according
    whether there is an even or odd number of pairs
    (i,k) for which
  • i gt k but i precedes k in s
  • We then define the sign of s by, written sgn s, by

if s is even if s is odd
51
Matrix
  • Example
  • Consider the permutation s 35142 in S5.
  • 3 and 5 precede and are greater than 1 hence
    (3,1), and (5,1)
  • 3,5 and 4 precede and are greater than 2 hence
    (3,2), (5,2), (4,2)
  • 5 precedes and is greater than 4 hence (5,4)
  • Since exactly 6 pairs satisfy the sgn(s), s is
    even and sgn(s) 1.

52
Matrix
  • Determinant
  • The determinant of matrix A is defined by

where the factors come from successive rows and
so the first subscripts are in the natural order
1,2,n. The sequence of second subscripts form a
permutation s in Sn. Also the sum is summed over
all permutations s in Sn. Example In S2, the
permutation 12 is even and the permutation 21 is
odd det(A) a11a21 a12a21 In S3, the
permutation are 123, 231 and 321 are even, and
the permutation 132, 213 and 312, are odd. det(A)
a11a22a33 a12a23a31 a13a21a32 a13a22a31
a12a21a33 a11a23a32
Permanent of A
where s is always equal to 1
53
Matrix
  • The incidence matrix
  • The incidence matrix T(G) of a graph G with N
    vertices and M edges is the NxM matrix the rows
    and columns of the matrix corresponding to
    vertices and edges, respectively, of G. It is
    defined as,

Example
M is the number of edges
54
Matrix
  • The circuit matrix
  • The circuit matrix C(G) of a graph G, whose
    cycles (circuits) c and edges e are labeled, is a
    cxe matrix the rows and columns of the matrix
    corresponding to circuits and edges,
    respectively, of G. It is defined as,

a b c d e
Example
C1 C2 C3
a
Cycles C1 a,d,e C2 b,c,e C3 a,b,c,d
G
b
e
d
c
55
Principle Components Analysis (PCA)
  • General Outline
  • Suppose you have a microarray dataset composed of
    1000 genes, each of which have an expression
    value over 10 experiments. The dimensionality of
    that dataset is therefore 10 or 1000.
  • The data, though clumped around several central
    points in that hyperspace, will generally tend
    towards one direction. If one were to draw a
    solid line that best describes that direction,
    then that line is the first principle component
    (PC). The result was a space in which the axes
    were the eigenvectors of the covariance matrix of
    the experiments (in this space each point is a
    gene) or the genes (in this space each point is
    an experiment).
  • Any variation that is not captured by that first
    PC is captured by subsequent orthogonal PCs
    (captures the maximum amount of variation left in
    the data).
  • The first 3 PCs could themselves act as Cartesian
    axes. The data they capture can therefore be
    plotted in terms of these axes. Hence there is a
    reduction of dimensionality.
  • When the data is plotted in this manner they are
    said to be plotted in PC-space.
  • References
  • http//www.ucl.ac.uk/oncology/MicroCore/HTML_resou
    rce/PCA_1.htm
  • Draghici S., 2003. Data Analysis Tools for DNA
    Microarray. Chapman Hall/CRC

56
Principle Components Analysis (PCA)
  • PCA is commonly used in microarray research as a
    cluster analysis tool.
  • to capture the variance in a dataset in terms of
    principle components.
  • trying to reduce the dimensionality of the data
    to summarize the most important (i.e. defining)
    parts whilst simultaneously filtering out noise.
  • Normalization, however, can sometimes remove this
    noise and make the data less variant, which could
    affect the ability of PCA to capture data
    structure.
  • PCA can be imposed on datasets to capture the
    cluster structure (just using the first few PC's)
    prior to cluster analysis (e.g. before performing
    k-Means clustering to determine a good value for
    K).

Coordinate transformation (translation
rotation) Most of the variance along the first
eigenvector.
Variance along the second eigenvector is
probably due to noise.
57
Principle Components Analysis (PCA)
  • PCA pay attention to those dimensions that
    account for a large variance in the data and to
    ignore the dimensions in which the data are not
    vary very much.
  • The direction of PC is determined by calculating
    the eigenvectors of the covariance matrix of the
    pattern.
  • Eigenvector of a matrix A is defined as a vector
    z such as
  • Az lz
  • where l is the eignevalue
  • For example,

has eignevalues l1 -1 and l2 -2, and
eigenvectors z1(1, 0) and z2 (1, -1)
  • Similarly expression for l2 -2
  • In other words, covariance matrix captures the
    shape of the set of data
  • Eigenvalue with largest absolute value implies
    that the data have the largest
  • variance along its eignevector

58
Principle Components Analysis (PCA)
  • How to calculate the eigenvalues
  • of a matrix ? Solve

How to find eigenvectors ?
Up to a multiple constant
Normalization of eignevectors ? 1 ?
59
Eigenvalues and eigenvectors
  • If A is a square matrix, then l is an eigenvalue
    of A if, for some nonzero vector
  • Av lv
  • where v is an eigenvector of A belonging to l.
  • Linear dependence
  • The vectors v1, , vm are said to be linearly
    dependent if there exist scalars a1, , am not
    all of them 0, such that
  • a1v1 . amvm 0
  • Otherwise, the vectors are said to be linearly
    independent.
  • Example
  • The vectors u (1, -1, 0), v(1,3,-1) and
    w(5,3,-2) are dependent since 3u2v-w0.
  • Theorem
  • Nonzero eignevectors belonging to distinct
    eignevalues are linearly independent.

60
Eigenvalues and eigenvectors
  • Theorem
  • An n-square matrix A is similar to a diagonal
    matrix B if and only if A has n linearly
    independent eigenvectors. In this case the
    diagonal elements of B are the corresponding
    eigenvalues.
  • In the above theorem, if we let P be the matrix
    whose columns are the n independent eigenvectors
    of A, then B P-1AP.
  • Example Consider the matrix .
  • A has two independent eigenvectors and
    .
  • Set and so
    .
  • Then A is similar to the diagonal matrix

As expected, the diagonal elements 4 and -1 of
the diagonal matrix B are eigenvalues
corresponding to the given eigenvectors.
61
Characteristic Polynomial Cayley-Hamilton
Theorem
  • The matrix tI-A, where I is the n-square matrix
    and t is a constant, is called the characteristic
    matrix of A. Its determinant DA(t) det(tI A)
    which is a polynomial in t is called the
    characteristic polynomial of A.
  • Cayley-Hamilton Theorem
  • Every matrix is a zero of its characteristic
    polynomial.
  • Example
  • The characteristic polynomial of the matrix
    is
  • D(t) (t-1)(t-2) (-2)(-3) t2 3t 4. As
    expected, A is a zero of D(A)

62
Singular Value Decomposition
  • Performing PCA is the equivalent of performing
    Singular Value Decomposition (SVD) on the
    covariance matrix of the data.
  • Applications Image processing and compression,
    Information Retrieval, Immunology, Molecular
    dynamics, least square error
  • What is SVD? (See MIT paper at the bottom of this
    page for more depth)
  • For the sake of example, let A be a nxp matrix
    that represents gene expression experiments such
    that the rows are genes and columns are
    experiments (i.e.arrays). The SVD of A is said to
    be the factorization
  • A USVT (Eq. 1)(Note that VT means 'the
    Transpose of matrix V').
  • How to determine U, V and S ?
  • In equation 1, matrices U and V are such that
    they are orthogonal (their columns are
    orthonormal, UTU I, VTV I). The columns of U
    are called left singular values (gene
    coefficients) and the rows of VT are called right
    singular values (expression level vectors).
  • To calculate the matrices U and V, one must
    calculate the eigenvectors and eigenvalues of AAT
    and ATA. These multiplications of A by its
    transpose results in a square matrix (the number
    of columns is equal to the number of rows).

63
Singular Value Decomposition
  • The eigenvectors of AAT (a nxn matrix) ? columns
    of U (a nxn matrix)
  • The eigenvectors of ATA (a pxp matrix) ? columns
    of V (a pxp matrix)
  • The eigenvalues of AAT or ATA, when
    square-rooted (s1? s2 ? ) ? the columns of S
    (nxp matrix)
  • The diagonal of S is said to be the singular
    values (min(n,p)) of the original matrix, A.
  • Each eigenvector described above represents a
    principle component. PC1 (Principle Component 1),
    which is defined as the eigenvector with the
    highest corresponding eigenvalue. The individual
    eigenvalues are numerically related to the
    variance they capture via PC's - the higher the
    value, the more variance they have captured.
  • Please also see Yeung Ruzzo (2001), where it is
    shown that lower PC's can capture data structure.
  • http//web.mit.edu/be.400/www/SVD/Singular_Value_D
    ecomposition.htm
  • http//www.netlib.org/lapack/lug/node53.html or
    http//kwon3d.com/theory/jkinem/svd.html

64
Singular Value Decomposition
Example
eigenvectors of AAT
0.34
eigenvalues of AAT or ATA
eigenvectors of ATA
65
Yeast Two-hybrid System
pubs.acs.org/cen/coverstory/ 7831/7831scit1.html
Have two plasmids    - fusion of target to
activation domain (AD).    - fusion of bait to
DNA binding domain (BD).If the target protein
bind to the bait protein it brings AD close to
BD.BD bind to the GAL4 promoter and brings AD
close so it can activate the DNA.LacZ is
expressed from a GAL4 promoter when proteins are
interacting. LacZ will make the yeast blue.
http//cmbi.bjmu.edu.cn/
66
Immunoprecipitation
  • Antibody added to a mixture radiolabel() or
    unlabeled proteins binds specifically to its
    antigen (A) (left tube).
  • Antibody-antigen complex is absorbed from
    solution through the addition of an immobilized
    antibody binding protein such as Protein
    A-Sepharose beads (middle panel).
  • Upon centrifugation, the antibody-antigen
    complex is brought down in the pellet (right
    panel).
  • Subsequent liberation of the antigen can be
    achieved by boiling the sample in the presence of
    SDS.

67
Co-Immunoprecipitation
  • Co-IP vs. IPCo-immunoprecipitation (Co-IP) is a
    popular technique for protein interaction
    discovery. Co-IP is conducted in essentially the
    same manner as an IP. However, in a co-IP the
    target antigen precipitated by the antibody
    co-precipitates a binding partner/protein
    complex from a lysate (?????), i.e., the
    interacting protein is bound to the target
    antigen, which becomes bound by the antibody that
    becomes captured on the Protein A or G gel
    support.
Write a Comment
User Comments (0)
About PowerShow.com