Title: Christine Steinhoff
1Lecture Part3 Principal Component Analysis and
its application in bioinformatics
Christine Steinhoff
2Part 1 Linear Algebra Basics
Part 2 Principal Component Analysis An
Introduction
Part 3 Principal Component Analysis Examples
3Part 1 Linear Algebra Basics
4OUTLINE
What do we need from linear algebra for
understanding principal component analysis ?
- Motivation
- Standard deviation, Variance, Covariance
- The Covariance matrix
- Symmetric matrix and orthogonality
- Eigenvalues and Eigenvectors
- Properties
5Motivation
6Motivation
Proteins 1 and 2 measured for 200 patients
Protein 2
Protein1
7Motivation
Patients 1
200
Genes 1 22,000
Microarray Experiment ? Visualize ? ? Which
genes are important ? ? For which subgroup of
patients ?
8Motivation
Genes 1 200
Patients 1
10
9Basics for Principal Component Analysis
- Orthogonal/Orthonormal
- Some Theorems...
- Standard deviation, Variance, Covariance
- The Covariance matrix
- Eigenvalues and Eigenvectors
10Standard Deviation
The average distance from the mean of the data
set to a point
MEAN
Example Measurement 1 0,8,12,20 Measurement
2 8,9,11,12
M1
M2
Mean 10
Mean 10
SD 8.33
SD 1.83
11Variance
Example Measurement 1 0,8,12,20 Measurement
2 8,9,11,12
M1
M2
Mean 10
Mean 10
SD 8.33
SD 1.83
Var 69.33
Var 3.33
12Covariance
Standard Deviation and Variance are
1-dimensional How much do the dimensions vary
from the mean with respect to each other
? Covariance measures between 2 dimensions
We easily see, if XY we end up with variance
13Covariance Matrix
Let X be a random vector. Then the covariance
matrix of X, denoted by Cov(X), is The
diagonals of Cov(X) are
. In matrix notation,
The covariance matrix is symmetric
14Symmetric Matrix
Let be a square matrix of size
nxn. The matrix A is symmetric, if
for all
15Orthogonality/Orthonormality
ltv1,v2gt lt(1 0),(0 1)gt 0
Two vectors v1 and v2 for which ltv1,v2gt0 holds
are said to be orthogonal
Unit vectors which are orthogonal are said to be
orthonormal.
16Eigenvalues/Eigenvectors
Let A be an nxn square matrix and x an nx1 column
vector. Then a (right) eigenvector of A is a
nonzero vector x such that For some scalar
Procedure Finding the eigenvalues
Finding corresponding eigenvectors
R eigen(matrix) Matlab eig(matrix)
17Some Remarks
If A and B are matrices whose sizes are such that
the given operations are defined and c is any
scalar then,
18Now,
We have enough definitions to go into the
procedure how to perform Principal Component
Analysis
19Part 2 Principal Component Analysis An
Introduction
20OUTLINE
What is principal component analysis good
for? Principal Component Analysis PCA
- One Toy example a spring
- The basic Idea of Principal Component Analysis
- The idea of transformation
- How to get there ? The mathematics part
- Some remarks
- Basic algorithmic procedure
21Idea of PCA
We often do not know which measurements best
reflects the dynamics in our system
http//www.snl.salk.edu/shlens/pub/notes/pca.pdf
22Idea of PCA
We often do not know which measurements best
reflects the dynamics in our system
So, PCA should reveal The dynamics are along
the x axis
x
23Idea of PCA
- Introduced by Pearson (1901) and Hotelling (1933)
to describe the variation in a set of
multivariate data in terms of a set of
uncorrelated variables - We typically have a data matrix of n observations
on p correlated variables x1,x2,xp - PCA looks for a transformation of the xi into p
new variables yi that are uncorrelated
24Idea
Genes x1 xp
Patients 1
n
Dimension high So how can we reduce the
dimension ? Simplest way take the first one,
two, three Plot and discard the rest Obviously
a very bad idea.
Matrix X
25Transformation
We want to find a transformation that involves
ALL columns, not only the first ones So find a
new basis, order it such that in the first
component lies almost ALL information of the
whole dataset
Looking for a transformation of the data matrix X
(pxn) such that Y ?T X?1 X1 ?2
X2.. ?p Xp
26Transformation
What is a reasonable choice for the ? ? Remember
We wanted a transformation that maximizes
information That means captures Variance in
the data
Maximize the variance of the projection of the
observations on the Y variables ! Find ? such
that Var(?T X) is maximal The matrix
CVar(X) is the covariance matrix of the Xi
variables
27Transformation
Can we intuitively see that in a picture?
Good
Better
28Transformation
PC1
PC2
Orthogonality
29How do we get there?
Patients 1
n
Genes x1 xp
X is a real valued pxn matrix Cov(X) is a real
value pxp matrix or nxn matrix -gt decide whether
you want to analyse patient groups Or do you want
to analyse gene groups?
30How do we get there?
Lets decide for genes
Cov(X)
31How do we get there
Some Features on Cov(X)
- Cov(X) is a symmetric pxp matrix
- The diagonal terms of Cov(X) are the variance
genes across patients - The off-diagonal terms of Cov(X) are the
covariance between gene vectors - Cov(X) captures the correlations between all
possible pairs of measurements - In the diagonal terms, by assumption, large
values correspond to interesting dynamics - In the off diagonal terms large values correspond
to high redundancy
32How do we get there?
The principal Components of X are the
Eigenvectors of Cov(X)
Assume, we can manipulate X a bit Lets call
this Y Y should be manipulated in a way that it
is a bit more optimal than X was What does
optimal mean? That means
SMALL!
Var
Cov
Var
Var
LARGE!
In other words should be diagonal and large
values on the diagonal
33How do we get there?
The manipulation is a change of the basis with
orthonormal vectors And they are ordered in a
way that the most important comes first
(principal) ... How do we put this in
mathematical terms?
Find orthonormal P such that
Y P X
With Cov(Y) diagonalized
Then the rows of P are the principal components
of X
34How do we get there?
Cov(Y) 1/(n-1) YY t
AXX t
35How do we get there?
A is symmetric Therefore there is a matrix E of
eigenvectors and a diagonal matrix D such that
Now define P to be the transpose of the matrix E
of eigenvectors
Then we can write A
36How do we get there?
Now we can go back to our Covariance Expression
Cov(Y)
37How do we get there?
The inverse of an orthogonal matrix is its
transpose (due to its definition)
In our context that means
Cov(Y)
38How do we get there?
P diagonalizes Cov(Y) Where P is the transpose of
the matrix of Eigenvectors of XX t The principal
components of X are the eigenvectors of XX
t (thats the same as the rows of P) The ith
diagonal value of Cov(Y) is the variance of X
along pi (along the ith principal)
Essentially we need to compute EIGENVALUES
and EIGENVECTORS
Explained variance
Principal components
Of the covariance matrix of the original matrix X
39Some Remarks
- If you multiply one variable by a scalar you get
different results - This is because it uses covariance matrix (and
not correlation) - PCA should be applied on data that have
approximately the same scale in each variable - The relative variance explained by each PC is
given by eigenvalue/sum(eigenvalues) - When to stop? For example Enough PCs to have a
cumulative variance explained by the PCs that is
gt50-70 - Kaiser criterion keep PCs with eigenvalues gt1
40Some Remarks
41Some Remarks
If variables have very heterogenous variances we
standardize them The standardized variables Xi
Xi (Xi-mean)/?variance The new
variables all have the same variance, so each
variable have the same weight.
42REMARKS
- PCA is useful for finding new, more informative,
uncorrelated features it reduces dimensionality
by rejecting low variance features - PCA is only powerful if the biological question
is related to the highest variance in the dataset
43Algorithm
Data (Data.old mean ) /sqrt(variance)
Cov(data) 1/(N-1) Datatr(Data)
Find Eigenvector/Eigenvalue (Function in R and
matlab eig) and sort
Eigenvectors V Eigenvalues P
Project the original data P data
Plot as many components as necessary
44Part 3 Principal Component Analysis Examples
45OUTLINE
Principal component analysis in bioinformatics
46OUTLINE
Principal component analysis in bioinformatics
47Example 1
48Lefkovits et al.
T cells belong to a group of white blood cells
known as lymphocytes and play a central role in
cell-mediated immunity.
49Lefkovits et al.
50Lefkovits et al.
51Lefkovits et al.
Clones 1
n
Spots x1 xp
X is a real valued pxn matrix They want to
analyse relatedness of clones Cov(X) is a real
value nxn matrix They take Correlation matrix
(which is on the top the division by the standard
deviations)
52Lefkovits et al.
53Example 2
54Yang et al.
- Transforming growth factor-beta
- TGF-beta is a potent inducer of growth arrest in
many cell types, including epithelial cells. - This activity is the basis of the tumor
suppressor role of the TGF-beta signaling system
in carcinomas. - contribute to cancer progression.
- special relevance in mesenchymal differentiation,
including bone development. - Deregulated expression or activation of
components of this signaling system can
contribute to skeletal diseases, e.g.
osteoarthritis.
55Yang et al.
Stock 1 T constitutively active tkv receptor
Stock 2 B constitutively active babo receptor
T1,T2,T3 B1,B2,B3 Contr1,2,3
genes x1 xp
56Yang et al.
- Filter Genes
- Only expressed (present) genes
- that show at least some effect comparing the
three groups
57Yang et al.
58Yang et al.
tkv
Babo
Control
59Ulloa-Montoya et al.
Multipotent Adult progenitor cells
Pluripotent Embryonic stem cells
Mesenchymal stem cells
60Ulloa-Montoya et al.
61(No Transcript)
62Yang et al.
But We only see the different experiments If
we do it the other way round that means
analysing for the genes not for the experiments
we see grouping of genes But we never see both
together. So, can we relate somehow the
experiments and the genes? That means group
genes whose expression might be explained by the
the respective experimental group (tkv, babo,
control)? This goes into correspondence
analysis
63(No Transcript)
64Vectorspace and Basis
- Let F be a field (for example real numbers) whose
elements are called scalars. - A vector space over the field F is a set V
together with the operations - vector addition V V ? V denoted v
w, where v, w ? V, and - scalar multiplication F V ? V denoted av,
where a ? F and v ? V, - Satisfying
- Vector addition is associative (u, v, w ? V, u
(v w) (u v) w) - Vector addition is commutative (v, w ? V, v w
w v) - Vector addition has an identity element (0 ? V,
such that v 0 v for all v ? V) - Vector addition has an inverse element
- (v ? V, there exists w ? V, such that v w
0. - Distributivity holds for scalar multiplication
over vector addition - (a ? F and v, w ? V, a(v w) a v a w)
- Distributivity holds for scalar multiplication
over field addition - (a, b ? F and v ? V, (a b) v a v b v)
- Scalar multiplication is compatible with
multiplication in the field of scalars - (a, b ? F and v ? V, a (b v) (ab) v)
- Scalar multiplication has an identity element
65Vectorspace and Basis
- A basis of V is a linearly independent set of
vectors in V which spans V. - Example Fn the standard basis
- V is finite dimensional if there is a finite
basis. Dimension of V is the number of elements
of a basis. (Independent of the choice of basis.)
66Orthogonal
An orthogonal matrix is a square matrix Q whose
transpose is its inverse
Matrix is orthogonally diagonalizable that
is, there exists an orthogonal matrix such
that
Orthogonal vectors inner product is zero
ltv,vgt0 Orthonormal vectors orthogonality and
length 1