Bioinformatics 40400 - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Bioinformatics 40400

Description:

Aoife McLysaght, Trinity College Dublin: I borrowed some of her s for ... unrooted trees. sequences. Bioinformatics 40400. Methods of Tree reconstruction ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 67
Provided by: gruye
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics 40400


1
Bioinformatics40400
  • Gianluca Pollastri
  • office CS A1.07
  • email gianluca.pollastri_at_ucd.ie

2
Credits
  • Richard Lathrop and Pierre Baldis Bioinformatics
    courses at University of California _at_ Irvine.

3
Phylogenetics credits
  • Aoife McLysaght, Trinity College Dublin I
    borrowed some of her slides for classes in
    Molecular Evolution..

4
Course overview
  • Context DNA, RNA, proteins
  • Resources GenBank, PDB, etc.
  • Algorithms for sequence comparison.
  • Phylogenetic trees.
  • Protein structure prediction.

5
Lecture notes
  • http//gruyere.ucd.ie/2007_courses/40400/
  • confidential..

6
Recommended/useful readings
  • No book is actually required
  • Introduction to Computational Molecular Biology
  • Setubal, Meidanis
  • Introduction to Bioinformatics
  • Lesk
  • Bioinformatics the Machine Learning approach
  • Baldi, Brunak
  • Biological sequence analysis (but this is a tough
    one)
  • Eddy, Durbin, Krogh, Mitchison

7
Course marking
  • 4 small things to do at home, each worth 10
  • 60 in the final exam

8
Reconstruct phylogeny from molecular data
ACTGTTACCGA
?
ACTGTTACCGA
ACTGTTACCGA
ACTGTTACCGA
ACTGTTACCGA
9
Note
  • There are two pieces of information in a
    phylogenetic tree
  • the topology order of divergence events
  • branch lengths extent of sequence divergence

10
Note 2
  • If we build a phylogeny based on one kind of
    sequence (for instance, a group of sequences from
    different organisms, one for each organism, that
    show similarity to each other), what we are
    building is a phylogeny of the sequences and not
    necessarily of the organisms.
  • For example horizontal transfer a gene might
    have been transferred from an organism to another
    at some stage.

11
Note 2 bis
  • Orthologues sequence divergence occurred after a
    speciation event
  • Paralogues sequence divergence after genome
    duplication may coexist in the same genome
  • Alignments cant tell between them, but phylogeny
    might.

12
About complexity of phylogeny
13
Methods of Tree reconstruction
  • Distance based UPGMA, Neighbour Joining
  • Maximum Parsimony
  • Maximum Likelihood (and full Bayesian)

14
Genetic distance
  • Distance from one sequence to another
  • Hamming Distance
  • Count number of differences
  • Attention there might be multiple hits number
    of events is greater than number of differences
    (some events cancel each other).
  • We would like to estimate number of events
  • Remember PAM matrices PAM250 equivalent to 20
    similarity, etc.

15
Distance methods UPGMA
  • Unweighted Pair-Group Method with Arithmetic
    means
  • Assumes constant molecular clock
  • Simplest method, often dangerous

16
Distance Matrix
17
UPGMA
.15/2
A
  • dAB is the smallest distance
  • Group A and B
  • Branch length dAB/2 (here we say evolution rate
    is constant..)
  • Recalculate distances from AB to other taxa as
    average
  • d(AB)C (dAC dBC)/2

.15/2
B
18
UPGMA
  • new distance matrix
  • Find smallest distance and continue as before
  • Repeat until all taxa are on tree

19
dAB/2
A
dAB/2
B
d(AB)C/2
C
d(ABC)D/2
D
20
(No Transcript)
21
UPGMA example
  • Started from horse myoglobin.
  • Looked for homologues with BLAST.
  • Collected a number of myoglobins

22
  • gtuniprotP02192MYG_BOVIN Myoglobin.
  • MGLSDGEWQLVLNAWGKVEADVAGHGQEVLIRLFTGHPETLEKFDKFKHL
    KTEAEMKASE
  • DLKKHGNTVLTALGGILKKKGHHEAEVKHLAESHANKHKIPVKYLEFISD
    AIIHVLHAKH
  • PSDFGADAQAAMSKALELFRNDMAAQYKVLGFHG
  • gtuniprotP02197MYG_CHICK Myoglobin.
  • MGLSDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL
    KTPDQMKGSE
  • DLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKIPVKYLEFISE
    VIIKVIAEKH
  • AADFGADSQAAMKKALELFRNDMASKYKEFGFQG
  • gtuniprotP68082MYG_HORSE Myoglobin.
  • MGLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHL
    KTEAEMKASE
  • DLKKHGTVVLTALGGILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISD
    AIIHVLHSKH
  • PGDFGADAQGAMTKALELFRNDIAAKYKELGFQG
  • gtuniprotP02144MYG_HUMAN Myoglobin.
  • MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHL
    KSEDEMKASE
  • DLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISE
    CIIQVLQSKH
  • PGDFGADAQGAMNKALELFRKDMASNYKELGFQG
  • gtuniprotP04247MYG_MOUSE Myoglobin.
  • MGLSDGEWQLVLNVWGKVEADLAGHGQEVLIGLFKTHPETLDKFDKFKNL
    KSEEDMKGSE
  • DLKKHGCTVLTALGTILKKKGQHAAEIQPLAQSHATKHKIPVKYLEFISE
    IIIEVLKKRH

23
  • 1 uniprotP02192MYG_BOVIN 154 2
    uniprotP02197MYG_CHICK 154 72
  • 1 uniprotP02192MYG_BOVIN 154 3
    uniprotP68082MYG_HORSE 154 88
  • 1 uniprotP02192MYG_BOVIN 154 4
    uniprotP02144MYG_HUMAN 154 84
  • 1 uniprotP02192MYG_BOVIN 154 5
    uniprotP04247MYG_MOUSE 154 78
  • 1 uniprotP02192MYG_BOVIN 154 6
    uniprotP02189MYG_PIG 154 88
  • 1 uniprotP02192MYG_BOVIN 154 7
    uniprotP02170MYG_RABIT 154 88
  • 1 uniprotP02192MYG_BOVIN 154 8
    uniprotP02190MYG_SHEEP 154 98
  • 1 uniprotP02192MYG_BOVIN 154 9
    uniprotP68279MYG_TURTR 154 85
  • 2 uniprotP02197MYG_CHICK 154 3
    uniprotP68082MYG_HORSE 154 75
  • 2 uniprotP02197MYG_CHICK 154 4
    uniprotP02144MYG_HUMAN 154 76
  • 2 uniprotP02197MYG_CHICK 154 5
    uniprotP04247MYG_MOUSE 154 74
  • 2 uniprotP02197MYG_CHICK 154 6
    uniprotP02189MYG_PIG 154 76
  • 2 uniprotP02197MYG_CHICK 154 7
    uniprotP02170MYG_RABIT 154 76
  • 2 uniprotP02197MYG_CHICK 154 8
    uniprotP02190MYG_SHEEP 154 72
  • 2 uniprotP02197MYG_CHICK 154 9
    uniprotP68279MYG_TURTR 154 72
  • 3 uniprotP68082MYG_HORSE 154 4
    uniprotP02144MYG_HUMAN 154 88
  • 3 uniprotP68082MYG_HORSE 154 5
    uniprotP04247MYG_MOUSE 154 82
  • 3 uniprotP68082MYG_HORSE 154 6
    uniprotP02189MYG_PIG 154 90
  • 3 uniprotP68082MYG_HORSE 154 7
    uniprotP02170MYG_RABIT 154 89

24
(No Transcript)
25
(No Transcript)
26
0.01
Sheep Cow Chick Horse Human Mouse Pig Rabbi
t Dolphin
0.01
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
0.01
Sheep Cow Human Pig Chick Horse Mouse Rabbi
t Dolphin
0.01
0.035
0.035
37
(No Transcript)
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
0.01
Sheep Cow Human Pig Rabbit Chick Horse Mous
e Dolphin
0.01
0.035
0.0125
0.035
0.0475
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
0.01
Sheep Cow Human Pig Rabbit Horse Chick Mous
e Dolphin
0.01
0.035
0.0125
0.035
0.0075
0.0475
0.055
47
(No Transcript)
48
(No Transcript)
49
0.01
Sheep Cow Human Pig Rabbit Horse Dolphin Ch
ick Mouse
0.01
0.035
0.0125
0.035
0.0075
0.0475
0.0033
0.055
0.0588
50
(No Transcript)
51
(No Transcript)
52
0.01
Sheep Cow Human Pig Rabbit Horse Dolphin Ch
ick Mouse
0.0579
0.01
0.035
0.0125
0.035
0.0075
0.0475
0.0033
0.055
0.0091
0.0588
53
(No Transcript)
54
(No Transcript)
55
0.01
Sheep Cow Human Pig Rabbit Horse Dolphin Mo
use Chick
0.0579
0.01
0.035
0.0125
0.0276
0.035
0.0075
0.0475
0.0033
0.055
0.0091
0.0588
0.0955
56
(No Transcript)
57
0.01
Sheep Cow Human Pig Rabbit Horse Dolphin Mo
use Chick
0.0579
0.01
0.035
0.0125
0.0276
0.035
0.0075
0.0475
0.0033
0.0374
0.055
0.0091
0.0588
0.0955
0.1329
58
Neighbour Joining (NJ) Saitou and Nei 87
  • Another distance based method.
  • As for UPGMA, we first compute the distance
    matrix, by aligning all pairs of sequences,
    pairwise.
  • Based on the minimum evolution criterion
    minimise the sum of the branch lengths

59
Neighbours
  • Neighbours are OTU, leaves of the tree connected
    by a node

1
3
2
4
If we join leaves, we create new neighbours
60
Start from a star tree
  • All nodes are neighbours at the beginning

1
3
x
2
4
N
just one OTU x at the beginning
61
Join nodes
  • We want to join the two nodes that give the
    minimal sum of branches.

3
1
x
y
4
2
by joining nodes we create a new OTU y
N
62
Join nodes
  • If I have N nodes, there are N(N-1)/2 ways of
    choosing two of them.
  • Lets call Lab the distance between OTU a and b,
    and Dij the distance between nodes i and j.
  • Then, the total distance for the star tree is

63
Lxy
  • Once weve joined nodes 1 and 2, the distance Lxy
    will be

minus all the distances weve counted in the
second OTU x
minus all the times weve counted D12
All distances from 1 and 2 through xy
64
substitute
  • The two subtractive terms are really the sums of
    the Lix in two star trees, so we can compute them
    from the

65
finally
  • The sum of the branches we obtain by joining 1
    and 2 is then
  • (all expressed in Dij, which we can obtain from
    the matrix)

66
The algorithm
  • do
  • Scan all pairs of nodes to find the one with the
    lowest Sij.
  • Join i and j in a single node, reestimate all
    branch lengths.
  • until just one node
Write a Comment
User Comments (0)
About PowerShow.com