Introduction to the GO: a users guide - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Introduction to the GO: a users guide

Description:

ISO - inferred from sequence orthology. ISM - inferred from sequence model ... public orthology prediction tool(s) 1:1 orthologs. transfer GO annotation to your ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 38
Provided by: Fio51
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the GO: a users guide


1
Introduction to the GOa users guide
  • Iowa State Workshop
  • 11 June 2009

2
All workshop materials are available at AgBase.
3
Genomic Annotation
  • Genome annotation is the process of attaching
    biological information to genomic sequences. It
    consists of two main steps
  • identifying functional elements in the genome
    structural annotation
  • attaching biological information to these
    elements functional annotation
  • biologists often use the term annotation when
    they are referring only to structural annotation

4
Structural annotation
DNA annotation
CHICK_OLF6
Protein annotation
Data from Ensembl Genome browser
5
Functional annotation
6
Structural Functional Annotation
  • Structural Annotation
  • Open reading frames (ORFs) predicted during
    genome assembly
  • predicted ORFs require experimental confirmation
  • the Sequence Ontology (SO) provides a structured
    controlled vocabulary for sequence annotation
  • Functional Annotation
  • annotation of gene products Gene Ontology (GO)
    annotation
  • initially, predicted ORFs have no functional
    literature and GO annotation relies on
    computational methods (rapid)
  • functional literature exists for many
    genes/proteins prior to genome sequencing
  • GO annotation does not rely on a completed genome
    sequence!

7
  • Provides structural annotation for agriculturally
    important genomes
  • Provides functional annotation (GO)
  • Provides tools for functional modeling
  • Provides bioinformatics modeling support for
    research community

8
Introduction to GO
  • pre-GO managing large datasets
  • Bio-ontologies
  • the Gene Ontology (GO)
  • a GO annotation example
  • GO evidence codes
  • literature biocuration computation analysis
  • ND vs no GO
  • sources of GO

9
1. pre-GO managing large datasets
10
AgBase User Support
  • Functional modeling training
  • Database ID mapping
  • approx. 75 of requests
  • Providing GO annotation for datasets/arrays
  • Assistance with GO modeling tools
  • Intermediary with between research community and
    public databases
  • NCBI, UniProtKB, GO Consortium
  • Computational assistance

11
Converting database accessions
  • UniProt database
  • Ensembl BioMart
  • Online analysis tools
  • DAVID, gprofiler, etc
  • AgBase database
  • ArrayIDer tool

More information about these tools is available
from the online workshop resources.
12
1. UniProt ID Mapping
13
2. Ensembl BioMart
NOTE Ensembl is scheduled to add plant microbe
species in 2009.
14
3. Online analysis tools
gprofiler conversion tool http//biit.cs.ut.ee/gp
rofiler/gconvert.cgi
This tool works for all species found in Ensembl.
15
3. Online analysis tools
Database for Annotation, Visualization and
Integrated Discovery (DAVID) http//david.abcc.nci
fcrf.gov/conversion.jsp
This tool works for a wide range of species.
16
4. AgBase ArrayIDer
Contact AgBase to request additional species.
17
(No Transcript)
18
2. Bio-ontologies
19
Bio-ontologies
  • Bio-ontologies are used to capture biological
    information in a way that can be read by both
    humans and computers.
  • necessary for high-throughput omics datasets
  • allows data sharing across databases
  • Objects in an ontology (eg. genes, cell types,
    tissue types, stages of development) are well
    defined.
  • The ontology shows how the objects relate to each
    other.

20
Bio-ontologies http//www.obofoundry.org/
21
Ontologies
relationships between terms
digital identifier (computers)
description (humans)
22
3. The Gene Ontology
23
Functional Annotation
  • Gene Ontology (GO) is the de facto method for
    functional annotation
  • Widely used for functional genomics (high
    throughput)
  • Many tools available for gene expression analysis
    using GO
  • The GO Consortium homepage

http//www.geneontology.org
24
GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
25
GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
GOID (unique)
aspect or ontology
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
GO evidence code
GO term name
26
GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
NDUFAB1 (UniProt P52505) Bovine NADH
dehydrogenase (ubiquinone) 1, alpha/beta
subcomplex, 1, 8kDa
Biological Process (BP or P) GO0006633 fatty
acid biosynthetic process TAS GO0006120
mitochondrial electron transport, NADH to
ubiquinone TAS GO0008610 lipid biosynthetic
process IEA
Molecular Function (MF or F) GO0005504 fatty
acid binding IDA GO0008137 NADH dehydrogenase
(ubiquinone) activity TAS GO0016491
oxidoreductase activity TAS GO0000036 acyl
carrier activity IEA
Cellular Component (CC or C) GO0005759
mitochondrial matrix IDA GO0005747 mitochondrial
respiratory chain complex I IDA GO0005739
mitochondrion IEA
ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
27
GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
  • Biocuration of literature
  • detailed function
  • depth
  • slower (manual)

ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
28
Biocuration of Literature detailed gene function
Find a paper about the protein.
29
Read paper to get experimental evidence of
function
experiment assayed kinase activity use IDA
evidence code
30
GO EVIDENCE CODES Direct Evidence Codes IDA -
inferred from direct assay IEP - inferred from
expression pattern IGI - inferred from genetic
interaction IMP - inferred from mutant
phenotype IPI - inferred from physical
interaction Indirect Evidence Codes inferred
from literature IGC - inferred from genomic
context TAS - traceable author statement NAS -
non-traceable author statement IC - inferred by
curator inferred by sequence analysis RCA -
inferred from reviewed computational analysis IS
- inferred from sequence IEA - inferred from
electronic annotation Other NR - not recorded
(historical) ND - no biological data available
GO Mapping Example
  • Biocuration of literature
  • detailed function
  • depth
  • slower (manual)
  • Sequence analysis
  • rapid (computational)
  • breadth of coverage
  • less detailed

ISS - inferred from sequence or structural
similarity ISA - inferred from sequence
alignment ISO - inferred from sequence
orthology ISM - inferred from sequence
model
31
Computational GO annotation (breadth)
ISO PIPELINE
accessions from your species (species 1)
public orthology prediction tool(s)
11 orthologs
existing GO annotations
transfer GO annotation to your species (ISO)
accessions with no ISO
ga file
(integrate output into one ga file)
Ranjit Kumar
32
Unknown Function vs No GO
  • ND no data
  • Biocurators have tried to add GO but there is no
    functional data available
  • Previously process_unknown, function_unknown,
    component_unknown
  • Now biological process, molecular function,
    cellular component
  • No annotations (including no ND) biocurators
    have not annotated

33
  • Primary sources of GO from the GO Consortium
    (GOC) GOC members
  • most up to date
  • most comprehensive
  • Secondary sources other resources that use GO
    provided by GOC members
  • public databases (eg. NCBI, UniProtKB)
  • genome browsers (eg. Ensembl)
  • array vendors (eg. Affymetrix)
  • GO expression analysis tools

34
  • Different tools and databases display the GO
    annotations differently.
  • Since GO terms are continually changing and GO
    annotations are continually added, need to know
    when GO annotations were last updated.

35
Secondary Sources of GO annotation
  • EXAMPLES
  • public databases (eg. NCBI, UniProtKB)
  • genome browsers (eg. Ensembl)
  • array vendors (eg. Affymetrix)
  • CONSIDERATIONS
  • What is the original source?
  • When was it last updated?
  • Are evidence codes displayed?

36
(No Transcript)
37
For more information about GO
  • GO Evidence Codes http//www.geneontology.org/GO.
    evidence.shtml
  • gene association file information
    http//www.geneontology.org/GO.format.annotation.s
    html
  • tools that use the GO http//www.geneontology.org
    /GO.tools.shtml
  • GO Consortium wiki http//wiki.geneontology.org/
    index.php/Main_Page

All websites are available from the workshop
website handout.
Write a Comment
User Comments (0)
About PowerShow.com