A QuickStart Guide to Using PhyloFacts

About This Presentation

Title:

A QuickStart Guide to Using PhyloFacts

Description:

... of annotation errors in a database of protein sequences' Bioinformatics 2002 ... molecular function: advances and challenges,' Bioinformatics 2004 (20)2:170-179 ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 12

Provided by: kimmensj

Category:

more less

Transcript and Presenter's Notes

Title: A QuickStart Guide to Using PhyloFacts

1
A Quick-Start GuidetoUsing PhyloFacts

February 21, 2008

2
Overview
A

Background
Browsing the library and reading PhyloFacts
books
Submitting sequences for functional and
structural classification
Database queries

B
C
D
3
Background
Retrieve paper
http//phylogenomics.berkeley.edu/phylofacts/
4
Background
Simple overview of different webservers (how to
use)
Detailed description of PhyloFacts construction
and recommended use and interpretation
5
Homology-based functional annotations are fraught
with systematic error
Background
Gilks et al, Modeling the percolation of
annotation errors in a database of protein
sequences Bioinformatics 2002 Galperin and
Koonin 1998 "Sources of Systematic Error in
Functional Annotation of Genomes" In Silico
Biology. Brenner, 1999 "Errors in Genome
Annotation" Trends Genet. Brown Sjölander,
"Functional Classification using Phylogenomic
Inference." PLoS Computational Biology, 2006
6
Structural phylogenomic inference of protein
function addresses these errors
Background
7
Phylogenomic library construction
Background
Cluster genome into global homology groups
8
Types of PhyloFacts books
Background

Global homology sequences that share a common
domain architecture
Alignable over entire length
Homologs retrieved using FlowerPower
Domain sequences that contain a structural
domain
Seeded using a PDB structure or SCOP domain
Conserved region sequences that share a region
of similarity
Correspondence to structure unknown
Motif short regions (typically lt50aa) conserved
for functional reasons

9
Proteins are composed of modular structural
domains which are found in different domain
architectures
Background
Leucine-Rich Repeat (LRR)
Toll-Interleukin Receptor (TIR) domain
PhyloFacts Global Homology books include only
those sequences that can be predicted to share
the same domain architecture (series of
structural domains). These are more suitable for
predicting function. PhyloFacts Domain books
model individual domains that may be found in
different domain architectures these thus
include sequences with different overall folds
and functions.
10
KEGG Orthology Group K00002 spans five domain
architectures
Background
Group 1 Zinc-binding dehydrogenase(all cellular
organisms)
ADH_N
ADH_zinc_N
Group 2 Iron-binding dehydrogenase (all
cellular organisms)
Group 3 Cofactor-binding domain of zinc-binding
dehydrogenase (Bacteria/Eukarya)
ADH_zinc_N
Group 4 Sequences of unknown function
(Halobacterium)
ADH_zinc_N
PF02894
Group 5 Aldo-keto reductase (Bacteria/Eukarya)
11
Summary

Each book in PhyloFacts contains
a multiple sequence alignment
one or more phylogenetic trees
Hidden Markov models for each subfamily and
family predicted PFAM domains
predicted trans-membrane helices
predicted subfamilies
homologous solved 3D structures
predicted functional residues
GO annotations and evidence codes
UniProt definitions
links to literature
links to genome databases and other external
resources
Graphical user interfaces to view
Multiple sequence alignment
Phylogenetic tree(s)
3D structure

PhyloFacts is an encyclopedia of protein families
across the Tree of Life
The majority of PhyloFacts books represent
proteins sharing a common domain architecture
The second largest fraction are based on protein
structures and structural domains
Functional annotation of a sequence included in a
PhyloFacts book is enabled by examination of the
sequence in its evolutionary context
New sequences can be classified to PhyloFacts
families and subfamilies using the Sequence
Search page
Results include functional classification,
prediction of 3D structure and detection of
remote homologs