Title: Introduction to
1Introduction to
Bioinformatics
2Introduction to Bioinformatics.
LECTURE 9 Clustering gene expression
Chapter 9 The genomics of wine-making
3Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
9.1 Chateau Hajji Feruz Tepe Wine making dates
back to at least 5000 BC, based on archeological
finds in Iran Hajji Feruz Tepe .
Overview of Neolithic houses at Hajji Feruz Tepe
that yielded six wine jars in the floor along
one wall of the room.
4Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
Wine making dates back to at least 5000 BC,
based on archeological finds in Iran Hajji Feruz
Tepe .
One of six jars once filled with wine from the
Neolithic residence at Hajji Feruz Tepe (Iran).
Chemical analysis of patches of a reddish
residue covering the interior of this vessel
showed that this originally was resinated wine.
5Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
Recipe for wine making 1. fruit juice (or
other sugar-rich liquid) 2. yeast
Saccharomyces cerevisiae
6(No Transcript)
7Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
Yeast (Saccharomyces cerevisiae) is a unicellular
fungus found naturally in grapevines and
responsible of wine-making fermenting sugars and
producing alchool.
8(No Transcript)
9(No Transcript)
10Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
From being budded off from its parent cell, to
reproducing its own offspring, each yeast cell
goes through a number of typical steps that also
involve changes in gene expression, turning whole
pathways on and off.
11(No Transcript)
12(No Transcript)
13Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
14Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
Remember, a gene is an on-off switch and RNa and
proteins are messengers between the genes. If a
gene is on the gene is expressed. The degree
to which the gene is expressed is called the
expression level of the gene. If a gene is off,
it can be said that it has expression level zero.
15Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
Today the study of such phenomena is possible
through the technology of microarray that can
measure the expression level of every gene in a
cell. With the gene expression data, genes can
be clustered on the basis of the similarity of
their expression profiles.
16Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
With water, sugar and flour, yeast ferments the
sugars in the dough and produces carbon dioxide
CO2 (this causes the dough to rise). In this
process it produces alcohol as a by-product
(originally perhaps as near-toxic
protection!). When the sugar supply is
exhausted S. cerevisiae must find a new source of
energy when oxygen is available it shifts to
respiration alcohol now becomes the source of
energy. This state change is called the
diauxic shift
17Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
S. cerevisiae is (one of) the most studied
organism in biology S. cerevisiae is a complex
unicellular Eukaryote 12.5 Mbp genome in 16
linear chromosomes (except mitochondriae)
containing 6400 genes (2000 more than E. coli).
18Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
19Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
S. cerevisiae can be regarded as a complex
factory transforming many raw materials to final
materials, involving many conveyor belts
between the genes Such a conveyor belt of
coupled expressed genes is called a genetic
pathway The diauxic shift means that the whole
system has to be transformed from the old process
to the new process, meaning that entire new
pathways are formed, and old pahways are shut-off.
20Introduction to Bioinformatics9.1 CHATEAU HAJJI
FERUZ TEPE
Therefore it is usefull to monitor the
genome-wide expression of S. cerevisiae in time,
including the diauxic shift. Such a conveyor
belt of coupled expressed genes is called a
genetic pathway This monitoring can be done
with microarrays, the foremost important tools in
bioinformatics. Other dynamical processes as
the Cell Cycle can also be studied with
microarrays. This requires the data analysis
of the microarrays here we study the clustering
of expression profiles time series of expression
levels.
21Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
- 9.2 Monitoring cellular communication
- Purpose of microarrays snap-shot of the
expression levels in the cell. - Expressed gene DNA ? mRNA ? proteins .
- In the cell therefore expressed genes cause
high numbers of mRNA molecules. - Idea of microarrays measure the concentrations
of mRNA, and reverse-compute the DNA belonging to
this mRNA. - As RNA can be spliced due to exons, the
backward computed DNA is not entirely equal to
the real DNA it is called cDNA complementary
DNA.
22Introduction to Bioinformatics9.2 MONITORING
CELLULAR COMMUNICATION
The cDNA computed from mRNA hints to an
expressed gene, the cDNA is stored as an EST
Expressed Sequence Tag. EST sequencing can
identify genes that are missed with ab initio
gene-finding methods, such as ORF-finder.
23Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
- 9.3 Microarray technologies
- A microarray is an array of sensitive spots,
each containing a stretch of DNA, e.g. based on
an EST - Hybridization (chemical binding) of the DNA
with components in the substrate indicates the
presence of the associated mRNA - The hybridization can be made visible by
inserting fluoriscent molecules on the DNA (red,
green) and later illuminating them with a
suitable laser
24(No Transcript)
25Until recently we lacked tools to observe
genome-wide expression 1989 saw the introduction
of the microarray technique by Stephen Fodor
But only in 1992 this technique became
generally available but still very costly
26(No Transcript)
27Introduction to Bioinformatics9.3 MICROARRAY
TECHNOLOGIES
28Introduction to Bioinformatics9.3 MICROARRAY
TECHNOLOGIES
Example of an Affymetrix microarray simulation.
Example of the simulated single-channel
oligonucleotide microarray slide image (crop from
top left corner) (a). We have used an Affymetrix
.cel file as the ground truth data. Thus the text
about the slide type is observable. Real
Affymetrix slide image is shown for comparison
(b).
29Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
- 9.4 The diauxic shift and yeast gene expression
- In 1997 DeRisi et alum used microarrays to
measure the genome-wide expression on S.
cerevisiae during the diauxic shift. - 9 initial hours of growth, 6 hours before the
diauxic shift, and 6 hour there after. - They compared the mRNAs in the array at t
time-steps before the diauxic shift, and compared
those with the mRNA-levels at time 0.
30Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
This experiment gave a set of 43.000 ratios
seven time-points (t1, t2,, t7) of 6400 gene
expression levels normalized o their start
value. This is the reference design in
microarray literature
31Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
This experiment typically provides a time
series that is small relative to the size of the
genome here m7 timepoints for n6400 genes.
This is due to the cost of an array 1000
euro/array With this kind of experiment we can
in principle also reconstruct the gene regulatory
networks
32Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- 9.4.1 Data Description
- First analyse the relative change in activity
- Less than 5 of the genes change more than
1.5-fold, or less then 0.67-fold. - fold-change f new_value/old_value if f gt 1
the fold-chance is f, if f lt 1 then the
fold-change is 1/f - Example x0 1, x1 0.3333, fold-change is
-3, x0 1, x1 3, fold-change is 3.
33Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- 9.4.1 Data Description
- Now select only those genes with an absolute
fold-change above a certain threshold - abs(fold-change) gt threshold
34Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- 9.4.1 Data Clustering
- Next, cluster the genes relative to their
expression levels. - High intra-cluster similarity and low
inter-cluster similarity. - Use a distance/similarity measure and a
clustering algorithm.
35Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Data Clustering
- 1. Define a suitable Distance Measure d(x1,x2),
e.g. Pearsons correlation coefficient, or a
normalized distance like the Mahalanobis
distance, or a metric like the generalized
p-norm. - 2. Define a clustering criterion, e.g. C
?ij in same cluster dij - ?ij in different
cluster dij. - 3. Apply a suitable clustering algorithm, e.g.
hierarchical, or K-means clustering.
36Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
Hierarchical clustering
37Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
K-means clustering
38Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Gene function and Clustering
- 1. Genes with similar expression profiles have
similar functions. - 2. Define a clustering criterion, e.g. C
?ij in same cluster dij - ?ij in different
cluster dij. - 3. Apply a suitable clustering algorithm, e.g.
hierarchical, or K-means clustering.
39Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Gene function and Clustering
- 1. Single linkage min i,j xi yj.
- 2. Average linkage mean i,j xi yj.
- 3. Centroid distance dAB mA mB
40Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- 9.4.3 Data Visualisation
- In a tree using Hierarchic clustering.
- In a plane using MDS
41Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Gene function and Clustering
- 2. Multi Dimensional Schaling
42Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Gene function and Clustering
- 1. Hierarchical clustering level of cut-off
43Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
- Pre-processing
- Select only genes with enough fold-change
- Delete missing values
44(No Transcript)
45(No Transcript)
46(No Transcript)
47Introduction to Bioinformatics9.4 THE DIAUXIC
SHIFT AND YEAST GENE EXPRESSION
48Heatmap timesteps ?
gene in hierarchical cluster ?
49Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
50Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
- 9.5 CASE STUDY Cell-cycle regulated genes
- A set of microarrays over the cell-cycle of
yeast.
51Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
From being budded off from its parent cell, to
reproducing its own offspring, each yeast go
through a number of typical step that also
involve changes in gene expression, turning whole
pathways on and off.
52Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
Here we examine the expressions of the entire
yeast genome through two rounds of the cell
cycle. The temporal expression of genes are
measured by microarray at 24 time points every
five hours. In detail we have the expression
profile of about 6400 genes.
53Introduction to Bioinformatics9.5 THE CELL CYCLE
54Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
55Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
56Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
57Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
58Introduction to BioinformaticsLECTURE 9
CLUSTERING GENE EXPRESSION
59END of LECTURE 9