Title: Gene expression analysis
1Tutorial 7
2Gene expression analysis
- Expression data
- GEO
- UCSC
- ArrayExpress
- General clustering methods
- Unsupervised Clustering
- Hierarchical clustering
- K-means clustering
- Tools for clustering
- EPCLUST
- Mev
- Functional analysis
- Go annotation
3Gene expression data sources
Microarrays
RNA-seq experiments
4Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
- Each column represents all the gene expression
levels from a single experiment. - Each row represents the expression of a gene
across all experiments.
5Expression Data Matrix
Exp1 Exp 2 Exp3 Exp4 Exp5 Exp6
Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9
Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7
Gene 3 -2.5 1.5 -0.1 -1.1 -1 0.1
Gene 4 2.9 2.6 2.5 -2.3 -0.1 -2.3
Gene 5 0.1 2.6 2.2 2.7 -2.1
Gene 6 -2.9 -1.9 -2.4 -0.1 -1.9 2.9
- Each element is a log ratio log2 (T/R).
- T - the gene expression level in the testing
sample - R - the gene expression level in the
reference sample
6Expression Data Matrix
Black indicates a log ratio of zero, i.e. TR
Green indicates a negative log ratio, i.e. TltR
Grey indicates missing data
Red indicates a positive log ratio, i.e. TgtR
7Microarray Data Different representations
TgtR
Log ratio
Log ratio
TltR
Exp
Exp
8How to search for expression profiles
- GEO (Gene Expression Omnibus)
- http//www.ncbi.nlm.nih.gov/geo/
- Human genome browser
- http//genome.ucsc.edu/
- ArrayExpress
- http//www.ebi.ac.uk/arrayexpress/
9(No Transcript)
10Searching for expression profiles in the GEO
Datasets - suitable for analysis with GEO tools
Expression profiles by gene
Probe sets
Microarray experiments
Groups of related microarray experiments
11Clustering
Download dataset
Statistic analysis
12Clustering analysis
13Clustering
Download dataset
Statistic analysis
14The expression distribution for different lines
in the cluster
15Searching for expression profiles in the Human
Genome browser.
16Keratine 10 is highly expressed in skin
17ArrayExpress
http//www.ebi.ac.uk/arrayexpress/
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22How to analyze gene expression data
23Unsupervised Clustering - Hierarchical Clustering
24Hierarchical Clustering
genes with similar expression patterns are
grouped together and are connected by a series of
branches (dendrogram).
2
1
6
3
5
4
Leaves (shapes in our case) represent genes and
the length of the paths between leaves represents
the distances between genes.
25- How to determine the similarity between two
genes? (for clustering)
Patrik D'haeseleer, How does gene expression
clustering work?, Nature Biotechnology 23, 1499 -
1501 (2005) , http//www.nature.com/nbt/journal/v
23/n12/full/nbt1205-1499.html
26Hierarchical clustering finds an entire hierarchy
of clusters.
If we want a certain number of clusters we need
to cut the tree at a level indicates that number
(in this case - four).
27Hierarchical clustering result
Five clusters
28Unsupervised Clustering K-means clustering
An algorithm to classify the data into K number
of groups.
K4
29How does it work?
1
2
3
4
The centroid of each of the k clusters becomes
the new means.
k initial "means" (in this casek3) are randomly
selected from the data set (shown in color).
k clusters are created by associating every
observation with the nearest mean
Steps 2 and 3 are repeated until convergence has
been reached.
The algorithm divides iteratively the genes into
K groups and calculates the center of each group.
The results are the optimal groups (center
distances) for K clusters.
30How should we determine K?
- Trial and error
- Take K as square root of gene number
31Tools for clustering - EPclust
http//www.bioinf.ebc.ee/EP/EP/EPCLUST/
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38 In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
Edit the input matrix Transpose,Normalize,Randomi
ze
K-means clustering
39 In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
Hierarchical clustering
40Data
Clusters
41 In the input matrix each column should
represents a gene and each row should represent
an experiment (or individual).
K-means clustering
42Samples found in cluster
Graphical representation of the cluster
Graphical representation of the cluster
4310 clusters, as requested
44Tools for clustering - MeV
http//www.tm4.org/mev/
45Gene expression function analysis
1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at
1316_at 1320_at 1405_i_at 1431_at 1438_at 1487_at
1494_f_at 1598_g_at
What can we learn from clusters?
46Gene Ontology (GO)
http//www.geneontology.org/
The Gene Ontology project provides an ontology of
defined terms representing gene
product properties. The ontology covers three
domains
47Gene Ontology (GO)
- Cellular Component (CC) - the parts of a cell or
its extracellular environment. - Molecular Function (MF) - the elemental
activities of a gene product at the molecular
level, such as binding or catalysis. - Biological Process (BP) - operations or sets of
molecular events with a defined beginning and
end, pertinent to the functioning of integrated
living units cells, tissues, organs,
and organisms.
48The GO tree
49GO sources
ISS Inferred from Sequence/Structural
Similarity IDA Inferred from Direct Assay IPI
Inferred from Physical Interaction TAS Traceab
le Author Statement NAS Non-traceable Author
Statement IMP Inferred from Mutant
Phenotype IGI Inferred from Genetic
Interaction IEP Inferred from Expression
Pattern IC Inferred by Curator ND No Data
available IEA Inferred from electronic annotation
50Search by AmiGO
51Results for alpha-synuclein
52 DAVID
http//david.abcc.ncifcrf.gov/
Functional Annotation Bioinformatics Microarray
Analysis
- Identify enriched biological themes,
particularly GO terms - Discover enriched functional-related
gene/protein groups - Cluster redundant annotation terms
- Explore gene names in batch
53annotation
classification
ID conversion
54Functional annotation
Upload
Annotation options
55(No Transcript)
56(No Transcript)
57Gene expression analysis
- Expression data
- GEO
- UCSC
- ArrayExpress
- General clustering methods
- Unsupervised Clustering
- Hierarchical clustering
- K-means clustering
- Tools for clustering
- EPCLUST
- Mev
- Functional analysis
- Go annotation