Orthology Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Orthology Analysis

Description:

Pairwise orthology confidence by orthostrapping' ... Orthology is not transitive! ... 286356 pairwise orthology assignments ( 75% orthostrap) ... – PowerPoint PPT presentation

Number of Views:329
Avg rating:3.0/5.0
Slides: 54
Provided by: erikson
Category:

less

Transcript and Presenter's Notes

Title: Orthology Analysis


1
Orthology Analysis
Erik SonnhammerCenter for Genomics and
BioinformaticsKarolinska Institutet, Stockholm
2
Outline
  • Basic concepts
  • BLAST-based approaches to orthology
  • Tree-based approaches to orthology
  • Domain-level orthology

3
Homologs
  • genes with a common origin
  • May be genes in the same or in different
    organisms
  • Does not say that function is identical
  • Can only be true or false, and not a percentage!
  • Homologs have the same 3D-structure layout

4
Homologs
Orthologs
Paralogs
5
Orthologs separated by speciation
Gene Xin human
Orthologs
Gene Xin ancient mammal
Gene X in rat
S
Out-paralogs
paralogs
In-paralogs
D
Orthologs
S
speciation
Time
6
In/Out-paralog definition
  • In-paralogs co-orthologs
  • paralogs that were duplicated after the
    speciation and hence are orthologs to a cluster
    in the other species
  • Out-paralogs not co-orthologs
  • paralogs that were duplicated before the
    speciation. Not necessarily in the same species.

Sonnhammer Koonin, Trends Genet. 18619-620
(2002)
7
Orthologs for functional genomics
  • Co-orthologs / inparalogs are more likely than
    outparalogs to have identical biochemical
    functions and biological roles.
  • Co-orthologs can be used to discover human gene
    function via model organism experiments
  • Co-orthologs are key to exploit functional
    genomics/proteomics data in in model organisms

8
Orthology and function conservation
  • Orthology does not say anything about
    evolutionary distance.
  • Close orthologs, e.g. human-mouse are very likely
    to have the same biological role in the organism.
  • Distant orthologs, e.g. human-worm are less
    likely to have the same phenotypical role, but
    may have the same role in the corresponding
    pathway.

9
Ortholog Databases
Sequence database Orthology detection method Ortholog database
SwTrembl proteomes Inparanoid (blast) Inparanoid
proteomes COGs (blast) COGs / KOGs
TIGR gene index COGs (blast) TOGA/EGO
proteomes OrthoMCL (blast) OrthoMCL
Pfam Orthostrapper (tree) HOPS
Pfam RIO (tree)
10
How to find orthologs?
  • 1. Calculate phylogenetic tree, look for
    orthologs in the tree (Orthostrapper, Rio)

2. Two-way best matches between two species can
be used to find orthologs without
trees. However, in-paralogs are harder to find
this way
11
Two-way best match approachto finding orthologs
12
COGs
  • COG2813

Out- paralogs
13
Blue species 1 Red species 2
Inparalog n ortholog identification
Inpara-n-oid
14
Blue species 1 Red species 2
Inparanoid
15
Resolve overlapping clusters
No overlap - no problems
Partial overlap - separate
Complete overlap - merge
16
Inparalog score
B
0
20
40
60
80
100
A
P
Score for inparalog P (scoreAP - scoreAB) /
(scoreAA - scoreAB)
17
Confidence values for main orthologs from sampling
  • TVHIVDDEEPVR---KSLAFM---LTMNGFA
  • T DD R K L M T G A
  • TILLIDDHPMLRTGVKQLISMAPDITVVGEA
  • Sampling with replacement
    insertions kept intact
  • GAFDEP---LVTHVR..........
  • GA T R
  • GAEEHMAPDILTLLR..........
  • Bootstrap
    alignment -gt bootstrap score
  • Confidence (bootstrap alignments best-best
    matches / nr of bootstraps)

18
http//inparanoid.cgb.ki.se
19
inparanoid.cgb.ki.se
Homo Sapiens vs. C. elegans
Remm et al, J. Mol. Biol. 3141041-1052 (2001)
20
Ortholog group sizes, human vs X
21
Nr of inparalogs per ortholog group
Species Avg. inparalogs in model organism ortholog groups Avg. inparalogs in human ortholog groups
Mouse 1.36 1.56
Fly 1.77 2.75
Worm 1.44 3.13
Mustard weed 3.73 3.33
Yeast 1.26 3.34
E. coli 1.73 3.57
22
Drawbacks of Blast-based orthology assignment
  • No guarantee that the same segment is used in
    different sequences
  • No evolutionary distance model
  • Does not take multiple domains into account

23
Domain orthology
  • Inparanoid Human-Fly ortholog pairs with domains
    in Pfam-A 13.0 20335
  • Different domain architectures 5411
  • Many of these are minor differences, e.g. 22 vs
    21 Spectrin repeats
  • Sometimes the difference is big
  • ef-hand UCH
  • TBC UCH

24
Tree-based approaches
25
Distance-based tree building
A1 MKFYSLPNFPEN A2 MKYYKLPDLPDE A3
MRFYTACENPRS
Distance matrix
1
A2 A3
A1 4 8
A2 10
A1 A2 A3
2
3
5
  • Bootstrapping
  • randomly pick columns to bootstrap alignment,
    calculate tree
  • Repeat 1000 times, frequency of node bootstrap
    support

26
Orthology by tree reconciliation
Species tree
Gene tree
Infer 2 duplications and 2 losses
27
Drawbacks of tree reconciliation for orthology
assignment
  • Assumption that the species tree is fully known
  • Does not give confidence values
  • Gene trees become unreliable when involving a lot
    of sequences (more data -gt less certainty)
  • Computationally expensive

28
Partial tree reconciliation
  • Find pairwise orthologs by computer parsing of
    tree.

29
Pairwise orthology confidence by orthostrapping
The original tree with bootstrap support values
30
Pairwise orthology confidence by orthostrapping
31
Pairwise orthology confidence by orthostrapping
32
Pairwise orthology confidence by orthostrapping
33
orthostrapper.cgb.ki.se
34
Orthology is not transitive!
Multiple species at different distances may give
erroneous groups, that includes out-paralogs
35
Orthology is not transitive!
Y H1 D1 H2 D2
Y
H2
D1
-gt Orthology strictly defined for only 2
species/cladesCombining species of different
distances is very dangerousBut OK to combine
multiple equidistant ones
36
Domain-level orthology
37
HOPS - Hierarchy of Orthologs and Paralogs
  1. All species in Pfam are bundled in groups
    according to scheme
  1. Apply Orthostrapper to groups at same level in
    Pfam families
  2. Display results in NIFAS

38
Pfam
39
Pfam in brief
SEED alignment representative members
Profile-HMM HMMer-2.0
Search database
FULL alignment
Description file
Manually curated
Automatically made
  • Release 13.0 (April 2004)
  • 7426 families Pfam-A domain families
  • Based on 1160000 sequences (Swissprot Trembl)
  • 21980 unique Pfam-A domain architectures
  • 73 of all proteins have gt1 Pfam-A domain

40
HOPS results
  • Pfam 10, 6190 families
  • 2450 families (40) have HOPS orthologs
  • 1319 families (21) have HOPS orthologs in all
    6 pairwise comparisons
  • 286356 pairwise orthology assignments (gt 75
    orthostrap)

Storm and Sonnhammer, Genome Research
132353-2362 (2003)
41
Ways to access HOPS
  • NIFAS graphical browser
  • By sequence ID at Pfam.cgb.ki.se/HOPS
  • Flatfiles (Orthostrap tables of 2 clades)

42
Pfam.cgb.ki.se/HOPS
43
(No Transcript)
44
(No Transcript)
45
Evolution of Domain Architectures
  • NIFAS

46
ATP sulfurylase /APS kinase
47
ATP sulfurylase domain, metazoa vs fungi
Orthologous shuffled domains?
48
APS kinase domain
49
HOPS orthologs of PPS1_HUMAN (ATP sulfurylase/APS
kinase)
50
Summary of ATP sulfurylases/APS kinases
Shuffled non-orthologous domains
Metazoa
Fungi
51
Conclusions
  • Orthologs can be detected by
  • Blast fast
  • tree slow but less error-prone
  • Species at different evolutionary distances
    should not be combined in orthology analysis
  • Inparanoid and Orthostrapper were designed to
    find inparalogs but not outparalogs
  • HOPS/NIFAS can be used to find domain orthologs
    and analyze domain architecture evolution

52
Future perspectives
  • Multiparanoid multiple species merging of
    pairwise Inparalogs.
  • Functional divergence among inparalogs

53
Acknowledgments
  • Christian Storm
  • Maido Remm
  • Andrey Alexeyenko
  • Volker Hollich
  • Mats Jonsson

http//sonnhammer.cgb.ki.se
Write a Comment
User Comments (0)
About PowerShow.com