Title: Bioinformatic PhD. course
1Bioinformatic PhD. course
Bioinformatics Xavier Messeguer Peypoch
(http//www.lsi.upc.es/alggen) LSI Dep. de
Llenguatges i Sistemes Informà tics BSC Barcelona
Supercomputing Center Universitat Politècnica de
Catalunya
2Contents
1. Biological introduction
2. Comparison of short sequences ( up to
10.000bps)
Dot Matrix Pairwise align.
Multiple align. Hash alg.
3. Comparison of large sequences ( more that
10.000bps)
Data structures Suffix trees MUMs
4. String matching
Exact Extended Approximate
5. Sequence assembly
6. Projects PROMO, MREPATT,
3Pairwise alignment
- Recall that with two strings of length n
S2
C A -1
__
S1
O(n2)
22-1
1
And with 3 strings?
4Multiple alignment
- What happens with three strings?
Let n be their length, then the cost becomes
A C A -1
__
O(n3)
23-1
And with k strings?
O(nk 2k k2)
5Multiple alignment programs
- Malig (Progressive alignment)
- http//alggen.lsi.upc.edu
- Clustal (Progressive alignment)
- http//www.ebi.ac.uk/clustalw
- TCoffee (Progressive alignment data bases)
- http//igs-server.cnr-mrs.fr/Tcoffee_cgi/index.c
gi
- HMM (Hidden Markov Models)
6Multiple progressive alignment
Run Malig (Progressive alignment) http//alggen
.lsi.upc.edu
Run Clustal (Progressive alignment) http//www.
ebi.ac.uk/clustalw