Title: Gene function analysis
1Gene function analysis
- Stem Cell Network
- Microarray Course, Unit 5
- May 2007
2Sections
- Introduction to Gene Ontology
- GOstat
- Example
3Gene Ontology
Michael Ashburner Annotate genes or proteins
Started for Drosophila melanogaster (fly). Now
expanded for all taxa
http//www.geneontology.org
4Gene Ontology
Biological process A phenomenon marked by changes
that lead to a particular result, mediated by one
or more gene products. Molecular function
Elemental activities, such as catalysis or
binding, describing the actions of a gene product
at the molecular level. A given gene product may
exhibit one or more molecular functions. Cellular
component The part of a cell of which a gene
product is a component for purpose of GO
includes the extracellular environment of cells
a gene product may be a component of one or more
parts of a cell this term includes gene products
that are parts of macromolecular complexes, by
the definition that all members of a complex
normally copurify under all except extreme
conditions.
5http//www.geneontology.org/GO_nature_genetics_200
0.pdf
Gene Ontology
Biological process
6http//www.geneontology.org/GO_nature_genetics_200
0.pdf
Gene Ontology
7http//www.geneontology.org/GO_nature_genetics_200
0.pdf
Gene Ontology
8Gene Ontology
Evidence codes http//www.geneontology.org/GO.evid
ence.shtml IC Inferred by Curator IDA Inferred
from Direct Assay IEA Inferred from Electronic
Annotation IEP Inferred from Expression Pattern
(2006) IGC Inferred from Genomic Context
(2007) IGI Inferred from Genetic
Interaction IMP Inferred from Mutant
Phenotype IPI Inferred from Physical
Interaction ISS Inferred from Sequence or
Structural Similarity NAS Non-traceable Author
Statement (2006) ND No biological Data
available RCA inferred from Reviewed
Computational Analysis TAS Traceable Author
Statement NR Not Recorded (2006)
9Gene Ontology
Stats. May 29th 2007. biological_process 13,553
terms (10,894 in 2006 9,277 in
2005) cellular_component 1,966 terms (1,815
1,512) molecular_function 7,609 terms (7,927
6,957), Total 23,128 terms (20,636 17,746)
10Gene Ontology
Stats. May 29th 2007. Mouse Genome Informatics
(The Jackson Laboratory http//www.informatics.jax
.org/)
- biological_process 14,200 genes, 42,675
annotations (3.0 kw/gene) 13,329 genes, 33,783
annotations (2.5 kw/gene) in 2006 - cellular_component 14,713 genes, 31,330
annotations (2.1 kw/gene) 13,547 genes, 26,515
annotations (2.0 kw/gene) - molecular_function 15,553 genes, 50,343
annotations (3.2 kw/gene) 14,056 genes, 40,806
annotations (2.9 kw/gene) - 8.3 terms per gene 7.5 in 2006
11Databases using Gene Ontology
NetAffx (Affymetrix probe annotations) Flybase
(sequences) was the first SGD (yeast) MGI
(mouse) InterPro (Protein sequences) ProDom
(Protein domains) Entrez Gene (gene information)
12GOstat
Find statistically overrepresented properties
within a group of genes as selected
by... ...typically, analysis of a DNA
microarray experiment
http//gostat.wehi.edu.au/
Beissbarth Speed (2004) Bioinformatics, 20
1464-1465.
13GOstat
Total set of genes 2,000 of 5,000 are X Not
significant
gene A gene B gene C gene D gene E
X X
Y Y
Total set of genes 4 of 5000 are Y Very
significant
- Do it for all Gene Ontology terms
- Take into account the structure of the ontology
- Sort by p-values
14Contigency Table
genes with GO in group
p-value 8e-52
467
51
total genes in group
Chi-square Test (Fisher's Exact Test for small
values)
9180
176
selected genes (e.g. differentially expressed)
reference group (e.g. all genes on array)
Probability of obtaining those values from a
random distribution.
15Web tool
16Web tool
17Output
18Example
We will study the function of a set of genes
selected via StemBase http//www.stembase.ca/ (s
ee corresponding Unit for more info on using
StemBase)
http//gostat.wehi.edu.au/
191. Select a set of genes
Objective Genes correlated to Lgals3bp (lectin,
galactoside-binding, soluble, 3 binding
protein) A galectin, a beta-galactoside-binding
protein implicated in modulating cell-cell and
cell-matrix interactions
201. Select a set of genes
211. Select a set of genes
221. Select a set of genes
231. Select a set of genes
24(No Transcript)
252. Run in GOstat
262. Run in GOstat
Calcium ion binding
mannosyl-oligosaccharide mannosidase activity
272. Run in GOstat
http//www.geneontology.org/amigo
282. Run in GOstat
Calcium ion binding
mannosyl-oligosaccharide mannosidase activity
292. Run in GOstat
303. Examine expression
MAN2A1 1448647_at MAN1A 1417111_at Lgals3bp 14483
80_at
313. Examine expression
323. Examine expression
33To know more
- Gene Ontology. http//www.geneontology.org/GO.doc.
shtm - GOstat
- http//gostat.wehi.edu.au
- Beissbarth Speed (2004) Bioinformatics, 20
1464-1465. - StemBase. http//www.stembase.ca
- See corresponding Unit in this course.