Title: http:www'aitbiotech'comimagesmicroarray'jpg
1From gene-lists to functional annotations
We got differentially expressed genes, now what
? Find function, enriched, reduce false positive
http//www.aitbiotech.com/images/microarray.jpg ht
tp//www.pnas.org/content/104/51/20374/F4.large.jp
g
2The 3 Gene Ontologies
- Molecular Function elemental activity/task
- the tasks performed by individual gene products
examples are carbohydrate binding and ATPase
activity - Biological Process biological goal or objective
- broad biological goals, such as dna repair or
purine metabolism, that are accomplished by
ordered assemblies of molecular functions - Cellular Component location or complex
- subcellular structures, locations, and
macromolecular complexes examples include
nucleus, telomere, and RNA polymerase II
holoenzyme
Modified from http//anil.cchmc.org/Intro_FunGen
_Feb2008_Jegga.ppt287,33,Slide 33
3Example Gene hammer
Function (what) Process (why) Drive a nail -
into wood Carpentry Drive stake - into soil
Gardening Smash a bug Pest Control A
performers juggling object
Entertainment
http//anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.p
pt284,34,Slide 34
4http//www.geneontology.org/
5http//anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.p
pt337,47,Slide 47
6A Small example of post-microarray analysis tools
Database Panther ToppGene STRING
GOTM Onto-Tools TF networks (P.A.I.N.T)
http//www.pantherdb.org
7PANTHER Protein Classification System
http//www.pantherdb.org
8WHAT CAN I DO ON THE PANTHER SITE? Protein
ANalysis Through Evolutionary Relationships Goal
The PANTHER site was designed to facilitate
functional analysis of large numbers of genes,
proteins or transcripts.
- Tools
- Explore protein families functionality,
molecular functions, biological processes and
pathways. - Generate lists of genes, proteins or
transcripts that belong to a given protein family
or subfamily, have a given molecular function or
participate in a given biological process or
pathway, e.g. generate a candidate gene list for
a disease. - Analyze lists of genes in a batch mode,
proteins or transcripts according to categories
based on family, molecular function, biological
process or pathway, e.g. analyze mRNA microarray
data.
9http//nar.oxfordjournals.org/cgi/content/full/31/
1/334 http//genome.cshlp.org/content/13/9/2129.fu
ll http//nar.oxfordjournals.org/cgi/content/full/
33/suppl_1/D284http//nar.oxfordjournals.org/cgi/
content/full/35/suppl_1/D247
10Single gene search
Batch gene search
http//www.pantherdb.org/sitemap.jsp
11Convert Gene list ID
Affy ID
Gene
symbol
1788_S_AT 36651_AT 41788_I_AT 35595_AT 36285_AT 39
586_AT 35160_AT 39424_AT
USP1 DDR1 WNT10B PRKAR1B MLL CD44 GNA13 MMP15 IER3
http//david.abcc.ncifcrf.gov/tools.jsp
12http//david.abcc.ncifcrf.gov/tools.jsp Paste the
AffyID list Select AFFY_ID as ID type Select List
type Gene List Submit list
Select HOMO SAPIENS as species, press the select
button Choose the Gene ID Conversion Tool Select
GENE_SYMBOL, submit and download the results
13Perform Panther Batch Search Copy the gene
symbol list and paste into the Batch search in
Panther http//www.pantherdb.org/ gt Batch
Search Select upload ID type Gene Symbol Select
File Type ID list Result page Genes Select 1
datasets NCBI H. sapiens Press the Search
button Press in the and select Biological
process
14Panther Export Options
Click on either Pie slices or Bars to get
sub-functions. Click on links to get gene lists
for the chosen function.
15Other Panther Options
http//www.pantherdb.org/genes/
16Other Panther Options
Task find genes in a specific ontology (or in a
few ontologies)
http//www.pantherdb.org/panther/ontologies.jsp
17Other Panther Options
Search PANTHER Pathway
Add legend to pathway
http//www.pantherdb.org/pathway/
18Other Panther Options
Gene expression tools
http//www.pantherdb.org/tools/
19Other Panther Options
20- Portal for
- gene list functional enrichment
- Candidate gene prioritization using either
functional annotations or network analysis - identification and prioritization of novel
disease candidate genes in the interactome.
http//toppgene.cchmc.org/ http//toppgene.cchmc.o
rg/help/help.jsp
21Hypergeometric distribution with Bonferroni
correction
http//nar.oxfordjournals.org/cgi/reprint/gkp427v1
22Hypergeometric calculator results
Just 2 clarification slides.
- What is a hypergeometric experiment?
- A hypergeometric experiment has the following
characteristics - Population size N, out of which M items are
success. - The researcher randomly selects a subset of n
items from a population. - Question what is the probability that k selected
item are success ? - What is a hypergeometric distribution?
- A hypergeometric distribution is a probability
distribution. It refers to the probabilities
associated with the number of successes in a
hypergeometric experiment. - Example
- We have a pack of 52 cards (26 black, success).
We randomly select 12 cards out of 52. - What is the probability of having 7 successes
(black) ? (0.21)
Hypergeometric calculator
http//stattrek.com/Tables/Hypergeometric.aspx
23When detecting differentially expressed genes, we
want to detect ONLY the differentially expressed,
with no false positives !
Statistical Corrections
In many analysis of biological experiments, a
great number of false positives are found among
the results. When making multiple comparisons, we
need to apply a statistical correction to our
threshold, to remove the maximum of false
positives. Commonly available statistical
corrections
http//cbi.labri.fr/outils/BlastSets/BlastSets_web
_manual/principles.html
24(No Transcript)
25Example Go to ToppGene web-page
http//toppgene.cchmc.org/ Choose ToppFun
link Copy the gene symbol list and paste into the
provided box, make sure that entry name is HGNC
symbol, press the Submit Query button. Go to
bottom of page, choose FDR correction method to
all features, and submit. Observe details of the
results, each at a time.
26Example a. Using ToppFun for gene list
enrichment analysis Construct a gene list
enrichment analysis on obesity-associated genes
27(No Transcript)
28(No Transcript)
29b. Using ToppGene for disease gene prioritization
based on functional similarity to training set
genes Query To rank or prioritize a list of
genes (test set) by functional annotation
similarity to training set.
Calculates score and p-value for the genes and
functions.
30c. Using ToppNet for disease gene prioritization
based on topological features in protein-protein
interactions network (PPIN) Query To rank or
prioritize a list of genes (test set) based on
topological features in PPIN.
31(No Transcript)
32d. Using ToppGenet to identify and prioritize the
neighboring genes of the "seeds" or training set
in protein-protein interactions network
(PPIN) Query To rank or prioritize a list of
genes in the interactome of training set genes
using either functional similarity (ToppGene) or
PPIN analysis (ToppNet).
Create network by functional similarity
(ToppGene) or network analysis (ToppNet).
Distance to seeds 1, the test set comprises all
genes that are immediate interactants of the
training set genes. purple nodes are the training
set or seed genes. grey nodes are the
interactants from the test set. The green nodes
(subset of the grey ones) are the top ranked ones
from the test set genes.
33A shift of focus to system biology in the
post-genomic era
http//string-db.org/
STRING (Search Tool for the Retrieval of
Interacting Genes/Proteins) (functional
connectivity within a proteome)
STRING is a database and web resource dedicated
to proteinprotein interactions, including both
physical and functional interactions. It weights
and integrates information from numerous sources,
including experimental repositories,
computational prediction methods and public text
collections, thus acting as a meta-database that
maps all interaction evidence onto a common set
of genomes and proteins.
Version 8.0 of STRING covers about 2.5 million
proteins from 630 organisms
Databases MINT, HPRD, BIND, DIP, BioGRID, KEGG
and Reactome, IntAct, EcoCyc , NCI-Nature Pathway
Interaction Database and Gene Ontology (GO)
protein complexes. SGD, OMIM , The Interactive
Fly, and all abstracts from PubMed
34(No Transcript)
35http//bioinfo.vanderbilt.edu/gotm/
http//bioinfo.vanderbilt.edu/gotm/GOTM_Manual.pdf
36Bar graph
Pathway details
Input details
Pathway gene details (all genes in pathway)
37The apoptosis pathway as described by KEGG
Underexpressed genes Overexpressed genes
38TF networks (P.A.I.N.T)
http//www.dbi.tju.edu/dbi/tools/paint/
39SUSPECTS is a server designed to automate the
first steps of the candidate gene approach.
http//www.genetics.med.ed.ac.uk/suspects/search.s
html
BRCA1
The 3D boxes represent genes. Higher, brighter
boxes represent better (higher scoring)
candidates. The width of a box corresponds to the
number of different types of evidence that
contribute to its score. If a box is blue then a
potentially relevant PubMed abstract has been
found.
40http//www.genetics.med.ed.ac.uk/prospectr/ BRCA1
PROSPECTR uses sequence features to rank
genes in order of their likelihood of involvement
in disease