Agenda - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Agenda

Description:

To systematically structuring and managing the knowledge? ... Alzheimer's disease, Huntington's disease, Prion disease.... 2. Enrichment analysis ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 67
Provided by: george59
Category:
Tags: agenda | prion

less

Transcript and Presenter's Notes

Title: Agenda


1
Agenda
  • Biological databases related to microarray
  • Gene Ontology
  • KEGG
  • Pathway enrichment analysis
  • Motif finding

2
1. Databases
Biological pathways and knowledge are very
complex
  • Is it possible to establish a database?
  • To systematically structuring and managing the
    knowledge?
  • To validate analysis result or be incorporated
    into analysis?

3
1.1 Gene Ontology
  • Ontologies Controlled vocabularies to describe
    fuctions of genes.
  • The database is structured as directed acyclic
    graphs (DAGs), which differ from hierarchical
    trees in that a 'child' (more specialized term)
    can have many 'parents' (less specialized terms).

4
1.1 Gene Ontology
Three major categories in Gene Ontology
Current term counts as of April 2, 2005 at 1800
Pacific time17708 terms, 93.8 with
definitions. 9263 biological_process1496
cellular_component6949 molecular_function
5
1.1 Gene Ontology
Evidence code How is the information collected?
  • IC inferred by curator
  • IDA inferred from direct assay
  • IEA inferred from electronic annotation
  • IEP inferred from expression pattern
  • IGI inferred from genetic interaction
  • IMP inferred from mutant phenotype
  • IPI inferred from physical interaction
  • ISS inferred from sequence or structural
    similarity
  • NAS non-traceable author statement
  • ND no biological data available
  • RCA inferred from reviewed computational analysis
  • TAS traceable author statement
  • NR not recorded
  • There may be (a lot of) errors in the database!!

6
1.1 Gene Ontology
  • Demo
  • Go to GO http//www.geneontology.org
  • Go to Tools" and click on "AmiGO".
  • Click Browse. Click on the boxes with "" to
    expand any category to look at its subcategories.
    Click on "-" to collapse again.
  • Type the term cell cycle" in the "Search
    GO"field. Press "Submit". You will then see all
    GO categories containig this word.
  • Click on a GO term, say cell cycle arrest.
    Genes belonging to this GO term can be shown.
    Further filter genes by Data source or
    Species.
  • Type the name cyclin" in Amigo. Change to the
    genes or proteins" selection button and press
    "Submit". You will then see a number of genes
    containing this name. Press some of the "Tree
    view" links. 
  • Note that in some cases, the same term category
    can exist in different places in the tree. This
    ontology is thus not strictly hierarchical, but
    shows complex "many-to-many" relationships
    between gene products, ontology terms and
    branches in the ontology tree. 

7
1.2 KEGG
http//www.genome.jp/kegg/pathway.html
8
1.2 KEGG Kyoto Encyclopedia of Genes and Genomes
KEGG is a suite of databases and associated
software, integrating our current knowledge on
molecular interaction networks in biological
processes (PATHWAY database), the information
about the universe of genes and proteins
(GENES/SSDB/KO databases), and the information
about the universe of chemical compounds and
reactions (COMPOUND/GLYCAN/REACTION databases).
The current statistics of KEGG databases is as
follows Number of pathways 23,574(PATHWAY
database) Number of reference pathways 265(PATHWAY
database) Number of ortholog tables 87(PATHWAY
database) Number of organisms 272(GENOME
database) Number of genes 911,584(GENES
database) Number of ortholog clusters 35,456(SSDB
database) Number of KO assignments 6,221(KO
database) Number of chemical compounds 12,737(COMP
OUND database) Number of glycans 11,017(GLYCAN
database) Number of chemical reactions 6,399(REACT
ION database) Number of reactant
pairs 5,953(RPAIR database)
9
1.2 KEGG
RNA polymerase
10
1.2 KEGG
Cell cycle
11
1.2 KEGG
Parkinsons disease
Alzheimers disease, Huntingtons disease, Prion
disease.
12
2. Enrichment analysis
  • After
  • Selecting DE genes, or
  • Classification, or
  • Clustering
  • We are usually given a gene list for further
    investigation.

How do we validate information contained in the
gene list by available biological knowledge?
13
2. Enrichment analysis
Cell cycle data Cells are synchronized and
samples taken at various time points (covering 2
cell cycles). 6162 genes are included.
From Fourier analysis, 800 genes with cyclic gene
expression pattern are selected for further
investigation. Are these 800 genes really
involved in cell cycle?
14
2. Enrichment analysis
http//db.yeastgenome.org/cgi-bin/GO/goTermMapper
15
2. Enrichment analysis
Is the selected set of genes enriched in the GO
term of cell cycle?
16
2. Enrichment analysis
17
2. Enrichment analysis
18
2. Enrichment analysis
19
2. Enrichment analysis
R code for chi-square test without continuity
correction gt chisq.test(matrix(c(285, 5012, 100,
691), 2, 2), correctF) Pearson's
Chi-squared test data matrix(c(285, 5012, 100,
691), 2, 2) X-squared 61.2644, df 1, p-value
4.99e-15
20
2. Enrichment analysis
Chi-squared test is an approximate test and may
not perform well when sample size small. Fishers
exact test is a better alternative.
Fishers exact test G genes in the genome
(G1663) are analyzed Functional category F
(Six functional categories). In a cluster of size
C, h genes are found to be in a functional
category F with m genes, then p-value (i.e. the
probability of observing h or more annotated
genes in the cluster is calculated as (Tavazoie
et al. 1999)
21
2. Enrichment analysis
  • In practice, we need to search through thousands
    of GO terms to determine which GO term is
    enriched in the selected gene set .
  • Multiple comparison problem!!
  • Difficulties Tests are highly dependent.
  • Hierarchical structure of the GO
  • e.g. Cell Proliferation is a parent GO term of
    Cell Cycle.
  • Each gene can belong to multiple GO terms.
  • e.g. human HoxA7 gene belongs to four GO terms
    Development, Nucleus, DNA dependent
    regulation and transcription, Transcription
    factor activity.

22
2. Enrichment analysis
  • Simple Fishers exact test
  • Ingenuity Pathway
  • A commercial package with good interface and
    human curated annotation. Can generate network
    figures.
  • NIH DAVID
  • Free and web-based. Perform enrichment analysis
    (Fishers exact test), adjust for multiple
    comparison and generate a table of results. Use
    multiple databases.
  • Gostats package in Bioconductor
  • Free and web-based. Perform enrichment analysis
    (Fishers exact test) and generate a table of
    results. Use only GO database.
  • More sophisticated and systematic methods
  • Gene set enrichment analysis (GSEA MIT
    Mesirovs group)
  • http//www.broad.mit.edu/gsea/
  • Gene set analysis (GSA Stanford Tibshiranis
    group)
  • http//www-stat.stanford.edu/tibs/GSA/

23
2. Enrichment analysis
  • Things to note when using biological database
  • Biological pathways and gene functions are
    complex and difficult to quantify.
  • Data may not be accurate. The analysis should
    take into account of strength of evidence.
  • May need to go to specific database for
    particular organism. (e.g. SGD for yeast FlyBase
    and BDGP for fly)
  • To systematically collect and manage massive
    biological knowledge from publications and
    experiments is an important and active research
    topic in bioinformatics.

24
3. Motif Finding
25
3. Motif Finding
http//web.indstate.edu/thcme/mwking/gene-regulati
on.html
26
3. Motif Finding
http//web.indstate.edu/thcme/mwking/gene-regulati
on.html
27
3. Motif Finding
  • Genes in a cluster have similar expression
    patterns.
  • They might share common regulatory motifs so they
    are expressed simultaneously.
  • It is of interest to find motifs from the gene
    clusters.

28
3. Motif Finding
The following materials are obtained from Shirley
Liu at Harvard.
29
3. Motif Finding
30
3. Motif Finding
31
3. Motif Finding
32
3. Motif Finding
33
3. Motif Finding
34
3. Motif Finding
35
3. Motif Finding
36
3. Motif Finding
37
3. Motif Finding
38
3. Motif Finding
39
3. Motif Finding
40
3. Motif Finding
41
3. Motif Finding
42
3. Motif Finding
43
3. Motif Finding
44
3. Motif Finding
45
3. Motif Finding
46
3. Motif Finding
47
3. Motif Finding
48
3. Motif Finding
49
3. Motif Finding
50
3. Motif Finding
51
3. Motif Finding
52
3. Motif Finding
53
3. Motif Finding
54
3. Motif Finding
55
3. Motif Finding
56
3. Motif Finding
57
3. Motif Finding
58
3. Motif Finding
59
3. Motif Finding
60
3. Motif Finding
61
3. Motif Finding
62
3. Motif Finding
63
3. Motif Finding
64
3. Motif Finding
65
3. Motif Finding
66
3. Motif Finding
Write a Comment
User Comments (0)
About PowerShow.com