Christine Steinhoff - PowerPoint PPT Presentation

1 / 66

About This Presentation

Title:

Christine Steinhoff

Description:

The diagonals of Cov(X) are . In matrix notation, The covariance matrix is symmetric ... In the diagonal terms, by assumption, large values correspond to ... – PowerPoint PPT presentation

Number of Views:217

Avg rating:3.0/5.0

Slides: 67

Provided by: stei162

Category:

more less

Transcript and Presenter's Notes

Title: Christine Steinhoff

1
Lecture Part3 Principal Component Analysis and
its application in bioinformatics
Christine Steinhoff
2
Part 1 Linear Algebra Basics
Part 2 Principal Component Analysis An
Introduction
Part 3 Principal Component Analysis Examples
3
Part 1 Linear Algebra Basics
4
OUTLINE
What do we need from linear algebra for
understanding principal component analysis ?

Motivation
Standard deviation, Variance, Covariance
The Covariance matrix
Symmetric matrix and orthogonality
Eigenvalues and Eigenvectors
Properties

5
Motivation
6
Motivation
Proteins 1 and 2 measured for 200 patients
Protein 2
Protein1
7
Motivation
Patients 1
200
Genes 1 22,000
Microarray Experiment ? Visualize ? ? Which
genes are important ? ? For which subgroup of
patients ?
8
Motivation
Genes 1 200
Patients 1
10
9
Basics for Principal Component Analysis

Orthogonal/Orthonormal
Some Theorems...
Standard deviation, Variance, Covariance
The Covariance matrix
Eigenvalues and Eigenvectors

10
Standard Deviation
The average distance from the mean of the data
set to a point
MEAN
Example Measurement 1 0,8,12,20 Measurement
2 8,9,11,12
M1
M2
Mean 10
Mean 10
SD 8.33
SD 1.83
11
Variance
Example Measurement 1 0,8,12,20 Measurement
2 8,9,11,12
M1
M2
Mean 10
Mean 10
SD 8.33
SD 1.83
Var 69.33
Var 3.33
12
Covariance
Standard Deviation and Variance are
1-dimensional How much do the dimensions vary
from the mean with respect to each other
? Covariance measures between 2 dimensions
We easily see, if XY we end up with variance
13
Covariance Matrix
Let X be a random vector. Then the covariance
matrix of X, denoted by Cov(X), is The
diagonals of Cov(X) are
. In matrix notation,
The covariance matrix is symmetric
14
Symmetric Matrix
Let be a square matrix of size
nxn. The matrix A is symmetric, if
for all
15
Orthogonality/Orthonormality
ltv1,v2gt lt(1 0),(0 1)gt 0
Two vectors v1 and v2 for which ltv1,v2gt0 holds
are said to be orthogonal
Unit vectors which are orthogonal are said to be
orthonormal.
16
Eigenvalues/Eigenvectors
Let A be an nxn square matrix and x an nx1 column
vector. Then a (right) eigenvector of A is a
nonzero vector x such that For some scalar
Procedure Finding the eigenvalues
Finding corresponding eigenvectors
R eigen(matrix) Matlab eig(matrix)
17
Some Remarks
If A and B are matrices whose sizes are such that
the given operations are defined and c is any
scalar then,
18
Now,
We have enough definitions to go into the
procedure how to perform Principal Component
Analysis
19
Part 2 Principal Component Analysis An
Introduction
20
OUTLINE
What is principal component analysis good
for? Principal Component Analysis PCA

One Toy example a spring
The basic Idea of Principal Component Analysis
The idea of transformation
How to get there ? The mathematics part
Some remarks
Basic algorithmic procedure

21
Idea of PCA
We often do not know which measurements best
reflects the dynamics in our system
http//www.snl.salk.edu/shlens/pub/notes/pca.pdf
22
Idea of PCA
We often do not know which measurements best
reflects the dynamics in our system
So, PCA should reveal The dynamics are along
the x axis
x
23
Idea of PCA

Introduced by Pearson (1901) and Hotelling (1933)
to describe the variation in a set of
multivariate data in terms of a set of
uncorrelated variables
We typically have a data matrix of n observations
on p correlated variables x1,x2,xp
PCA looks for a transformation of the xi into p
new variables yi that are uncorrelated

24
Idea
Genes x1 xp
Patients 1
n
Dimension high So how can we reduce the
dimension ? Simplest way take the first one,
two, three Plot and discard the rest Obviously
a very bad idea.
Matrix X
25
Transformation
We want to find a transformation that involves
ALL columns, not only the first ones So find a
new basis, order it such that in the first
component lies almost ALL information of the
whole dataset
Looking for a transformation of the data matrix X
(pxn) such that Y ?T X?1 X1 ?2
X2.. ?p Xp
26
Transformation
What is a reasonable choice for the ? ? Remember
We wanted a transformation that maximizes
information That means captures Variance in
the data
Maximize the variance of the projection of the
observations on the Y variables ! Find ? such
that Var(?T X) is maximal The matrix
CVar(X) is the covariance matrix of the Xi
variables
27
Transformation
Can we intuitively see that in a picture?
Good
Better
28
Transformation
PC1
PC2
Orthogonality
29
How do we get there?
Patients 1
n
Genes x1 xp
X is a real valued pxn matrix Cov(X) is a real
value pxp matrix or nxn matrix -gt decide whether
you want to analyse patient groups Or do you want
to analyse gene groups?
30
How do we get there?
Lets decide for genes
Cov(X)
31
How do we get there
Some Features on Cov(X)

Cov(X) is a symmetric pxp matrix
The diagonal terms of Cov(X) are the variance
genes across patients
The off-diagonal terms of Cov(X) are the
covariance between gene vectors
Cov(X) captures the correlations between all
possible pairs of measurements
In the diagonal terms, by assumption, large
values correspond to interesting dynamics
In the off diagonal terms large values correspond
to high redundancy

32
How do we get there?
The principal Components of X are the
Eigenvectors of Cov(X)
Assume, we can manipulate X a bit Lets call
this Y Y should be manipulated in a way that it
is a bit more optimal than X was What does
optimal mean? That means
SMALL!
Var
Cov
Var
Var
LARGE!
In other words should be diagonal and large
values on the diagonal
33
How do we get there?
The manipulation is a change of the basis with
orthonormal vectors And they are ordered in a
way that the most important comes first
(principal) ... How do we put this in
mathematical terms?
Find orthonormal P such that
Y P X
With Cov(Y) diagonalized
Then the rows of P are the principal components
of X
34
How do we get there?
Cov(Y) 1/(n-1) YY t
AXX t
35
How do we get there?
A is symmetric Therefore there is a matrix E of
eigenvectors and a diagonal matrix D such that
Now define P to be the transpose of the matrix E
of eigenvectors
Then we can write A
36
How do we get there?
Now we can go back to our Covariance Expression
Cov(Y)
37
How do we get there?
The inverse of an orthogonal matrix is its
transpose (due to its definition)
In our context that means
Cov(Y)
38
How do we get there?
P diagonalizes Cov(Y) Where P is the transpose of
the matrix of Eigenvectors of XX t The principal
components of X are the eigenvectors of XX
t (thats the same as the rows of P) The ith
diagonal value of Cov(Y) is the variance of X
along pi (along the ith principal)
Essentially we need to compute EIGENVALUES
and EIGENVECTORS
Explained variance
Principal components
Of the covariance matrix of the original matrix X
39
Some Remarks

If you multiply one variable by a scalar you get
different results
This is because it uses covariance matrix (and
not correlation)
PCA should be applied on data that have
approximately the same scale in each variable
The relative variance explained by each PC is
given by eigenvalue/sum(eigenvalues)
When to stop? For example Enough PCs to have a
cumulative variance explained by the PCs that is
gt50-70
Kaiser criterion keep PCs with eigenvalues gt1

40
Some Remarks
41
Some Remarks
If variables have very heterogenous variances we
standardize them The standardized variables Xi
Xi (Xi-mean)/?variance The new
variables all have the same variance, so each
variable have the same weight.
42
REMARKS

PCA is useful for finding new, more informative,
uncorrelated features it reduces dimensionality
by rejecting low variance features
PCA is only powerful if the biological question
is related to the highest variance in the dataset

43
Algorithm
Data (Data.old mean ) /sqrt(variance)
Cov(data) 1/(N-1) Datatr(Data)
Find Eigenvector/Eigenvalue (Function in R and
matlab eig) and sort
Eigenvectors V Eigenvalues P
Project the original data P data
Plot as many components as necessary
44
Part 3 Principal Component Analysis Examples
45
OUTLINE
Principal component analysis in bioinformatics
46
OUTLINE
Principal component analysis in bioinformatics
47
Example 1
48
Lefkovits et al.
T cells belong to a group of white blood cells
known as lymphocytes and play a central role in
cell-mediated immunity.
49
Lefkovits et al.
50
Lefkovits et al.
51
Lefkovits et al.
Clones 1
n
Spots x1 xp
X is a real valued pxn matrix They want to
analyse relatedness of clones Cov(X) is a real
value nxn matrix They take Correlation matrix
(which is on the top the division by the standard
deviations)
52
Lefkovits et al.
53
Example 2
54
Yang et al.

Transforming growth factor-beta
TGF-beta is a potent inducer of growth arrest in
many cell types, including epithelial cells.
This activity is the basis of the tumor
suppressor role of the TGF-beta signaling system
in carcinomas.
contribute to cancer progression.
special relevance in mesenchymal differentiation,
including bone development.
Deregulated expression or activation of
components of this signaling system can
contribute to skeletal diseases, e.g.
osteoarthritis.

55
Yang et al.
Stock 1 T constitutively active tkv receptor
Stock 2 B constitutively active babo receptor
T1,T2,T3 B1,B2,B3 Contr1,2,3
genes x1 xp
56
Yang et al.

Filter Genes
Only expressed (present) genes
that show at least some effect comparing the
three groups

57
Yang et al.
58
Yang et al.
tkv
Babo
Control
59
Ulloa-Montoya et al.
Multipotent Adult progenitor cells
Pluripotent Embryonic stem cells
Mesenchymal stem cells
60
Ulloa-Montoya et al.
61
(No Transcript)
62
Yang et al.
But We only see the different experiments If
we do it the other way round that means
analysing for the genes not for the experiments
we see grouping of genes But we never see both
together. So, can we relate somehow the
experiments and the genes? That means group
genes whose expression might be explained by the
the respective experimental group (tkv, babo,
control)? This goes into correspondence
analysis
63
(No Transcript)
64
Vectorspace and Basis

Let F be a field (for example real numbers) whose
elements are called scalars.
A vector space over the field F is a set V
together with the operations
vector addition V V ? V denoted v
w, where v, w ? V, and
scalar multiplication F V ? V denoted av,
where a ? F and v ? V,
Satisfying
Vector addition is associative (u, v, w ? V, u
(v w) (u v) w)
Vector addition is commutative (v, w ? V, v w
w v)
Vector addition has an identity element (0 ? V,
such that v 0 v for all v ? V)
Vector addition has an inverse element
(v ? V, there exists w ? V, such that v w
0.
Distributivity holds for scalar multiplication
over vector addition
(a ? F and v, w ? V, a(v w) a v a w)
Distributivity holds for scalar multiplication
over field addition
(a, b ? F and v ? V, (a b) v a v b v)
Scalar multiplication is compatible with
multiplication in the field of scalars
(a, b ? F and v ? V, a (b v) (ab) v)
Scalar multiplication has an identity element

65
Vectorspace and Basis

A basis of V is a linearly independent set of
vectors in V which spans V.
Example Fn the standard basis
V is finite dimensional if there is a finite
basis. Dimension of V is the number of elements
of a basis. (Independent of the choice of basis.)

66
Orthogonal
An orthogonal matrix is a square matrix Q whose
transpose is its inverse
Matrix is orthogonally diagonalizable that
is, there exists an orthogonal matrix such
that
Orthogonal vectors inner product is zero
ltv,vgt0 Orthonormal vectors orthogonality and
length 1

Write a Comment

User Comments (0)