Title: Introduction to the GO: a users guide
1Introduction to the GOa users guide
- Iowa State Workshop
- 11 June 2009
2 All workshop materials are available at AgBase.
3Genomic Annotation
- Genome annotation is the process of attaching
biological information to genomic sequences. It
consists of two main steps - identifying functional elements in the genome
structural annotation - attaching biological information to these
elements functional annotation - biologists often use the term annotation when
they are referring only to structural annotation
4Structural annotation
DNA annotation
CHICK_OLF6
Protein annotation
Data from Ensembl Genome browser
5Functional annotation
6Structural Functional Annotation
- Structural Annotation
- Open reading frames (ORFs) predicted during
genome assembly - predicted ORFs require experimental confirmation
- the Sequence Ontology (SO) provides a structured
controlled vocabulary for sequence annotation - Functional Annotation
- annotation of gene products Gene Ontology (GO)
annotation - initially, predicted ORFs have no functional
literature and GO annotation relies on
computational methods (rapid) - functional literature exists for many
genes/proteins prior to genome sequencing - GO annotation does not rely on a completed genome
sequence!
7- Provides structural annotation for agriculturally
important genomes - Provides functional annotation (GO)
- Provides tools for functional modeling
- Provides bioinformatics modeling support for
research community
8Introduction to GO
- pre-GO managing large datasets
- Bio-ontologies
- the Gene Ontology (GO)
- a GO annotation example
- GO evidence codes
- literature biocuration computation analysis
- ND vs no GO
- sources of GO
91. pre-GO managing large datasets
10AgBase User Support
- Functional modeling training
- Database ID mapping
- approx. 75 of requests
- Providing GO annotation for datasets/arrays
- Assistance with GO modeling tools
- Intermediary with between research community and
public databases - NCBI, UniProtKB, GO Consortium
- Computational assistance
11Converting database accessions
- UniProt database
- Ensembl BioMart
- Online analysis tools
- DAVID, gprofiler, etc
- AgBase database
- ArrayIDer tool
More information about these tools is available
from the online workshop resources.
121. UniProt ID Mapping
132. Ensembl BioMart
NOTE Ensembl is scheduled to add plant microbe
species in 2009.
143. Online analysis tools
gprofiler conversion tool http//biit.cs.ut.ee/gp
rofiler/gconvert.cgi
This tool works for all species found in Ensembl.
153. Online analysis tools
Database for Annotation, Visualization and
Integrated Discovery (DAVID) http//david.abcc.nci
fcrf.gov/conversion.jsp
This tool works for a wide range of species.
164. AgBase ArrayIDer
Contact AgBase to request additional species.
17(No Transcript)
182. Bio-ontologies
19Bio-ontologies
- Bio-ontologies are used to capture biological
information in a way that can be read by both
humans and computers. - necessary for high-throughput omics datasets
- allows data sharing across databases
- Objects in an ontology (eg. genes, cell types,
tissue types, stages of development) are well
defined. - The ontology shows how the objects relate to each
other.
20Bio-ontologies http//www.obofoundry.org/
21Ontologies
relationships between terms
digital identifier (computers)
description (humans)
223. The Gene Ontology
23Functional Annotation
- Gene Ontology (GO) is the de facto method for
functional annotation - Widely used for functional genomics (high
throughput) - Many tools available for gene expression analysis
using GO - The GO Consortium homepage
http//www.geneontology.org
24GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
25GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
GOID (unique)
aspect or ontology
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
GO evidence code
GO term name
26GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
27GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
- Biocuration of literature
- detailed function
- depth
- slower (manual)
ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
28Biocuration of Literature detailed gene function
Find a paper about the protein.
29Read paper to get experimental evidence of
function
experiment assayed kinase activity use IDA
evidence code
30GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
- Biocuration of literature
- detailed function
- depth
- slower (manual)
- Sequence analysis
- rapid (computational)
- breadth of coverage
- less detailed
ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
31Computational GO annotation (breadth)
ISO PIPELINE
accessions from your species (species 1)
public orthology prediction tool(s)
11 orthologs
existing GO annotations
transfer GO annotation to your species (ISO)
accessions with no ISO
ga file
(integrate output into one ga file)
Ranjit Kumar
32Unknown Function vs No GO
- ND no data
- Biocurators have tried to add GO but there is no
functional data available - Previously process_unknown, function_unknown,
component_unknown - Now biological process, molecular function,
cellular component - No annotations (including no ND) biocurators
have not annotated
33- Primary sources of GO from the GO Consortium
(GOC) GOC members - most up to date
- most comprehensive
- Secondary sources other resources that use GO
provided by GOC members - public databases (eg. NCBI, UniProtKB)
- genome browsers (eg. Ensembl)
- array vendors (eg. Affymetrix)
- GO expression analysis tools
34- Different tools and databases display the GO
annotations differently. - Since GO terms are continually changing and GO
annotations are continually added, need to know
when GO annotations were last updated.
35Secondary Sources of GO annotation
- EXAMPLES
- public databases (eg. NCBI, UniProtKB)
- genome browsers (eg. Ensembl)
- array vendors (eg. Affymetrix)
- CONSIDERATIONS
- What is the original source?
- When was it last updated?
- Are evidence codes displayed?
36(No Transcript)
37For more information about GO
- GO Evidence Codes http//www.geneontology.org/GO.
evidence.shtml - gene association file information
http//www.geneontology.org/GO.format.annotation.s
html - tools that use the GO http//www.geneontology.org
/GO.tools.shtml - GO Consortium wiki http//wiki.geneontology.org/
index.php/Main_Page
All websites are available from the workshop
website handout.