Title: Multiple Sequence Alignment
1Multiple Sequence Alignment
- ClustalW
- TCoffee
- Ka, Ks, and Ka/Ks
- Anchored alignment
2ClustalW
- http//www.ebi.ac.uk/clustalw/
3ClustalW
4ClustalW
Paste your sequences
Run
5Results
Press Start Jalview for interactive view of the
alignment
6ClustalW output format
Guide Tree Cladogram
7Exercise
- HomoloGene is a system for automated detection of
homologs among the annotated genes of several
completely sequenced eukaryotic genomes. - Download the FASTA sequences of HomoloGene5276
and align them with ClustalW
8(No Transcript)
9Result
10TCoffee
Tcoffee computes its alignments by combining a
collection of smaller alignments
11Main features
- Multiple Sequence Alignment
- Structure based Multiple Sequence Alignment
- Combining the output of several multiple sequence
alignment packages - Combining two (or more) multiple sequence
alignments into a single one - Turning amino acid alignments into CDS nucleotide
alignments
12Exercise
- The 18-kDa protein plays an important role in
fertilization of several abalone species - Build a multiple sequence alignment using the
following sequences
13Sequences
- gtgi604533gbAAC37231.1 fertilization protein
- MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
EDMGYPITPPQWTTLLYYNR - ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
GMVRVSRRHTSTAIAKRIVA - MKVADLPCN
- gtgi604531gbAAC37233.1 fertilization protein
- MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
LVKFKRHWLVGANWKLQKFE - TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
YLIVFRMWIGVLKKNLKRSE - ITKPMQKLLDTKDGELPCPVRKIHG
- gtgi604529gbAAC37232.1 fertilization protein
- MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
RDMRWNLGPGFVFLLKKVNR - ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
DVFRDVQGFRGPKMTAAMRK - YSSKDPGTFPCKNEKRRG
- gtgi604527gbAAC37230.1 fertilization protein
- MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
ENMGYPITPPQWTTLLYYNR - QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
EMVRVMRRYKSTAIAKKIVA - MKVADLPCN
14Choose TCoffee Regular, paste the sequences in
the data box, and press submit
15(No Transcript)
16Estimating the rate of evolution
- In order to study selection patterns, you will
need to have the corresponding DNA alignment - By using the PROTOGENE (Protein-to-Gene) in
Tcoffee, the amino-acid alignment will be
transformed into the corresponding DNA
alignment. The actual procedure is tBLASTn.
17(No Transcript)
18Results
19In case it takes too long
- gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
_DESC_ fertilization protein MATCHES_ON Haliotis
assimilis fertilization protein mRNA, complete
cds - ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
GGAC------ - ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
CGCAATGAAG - GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
C---ATTGAG - GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
CAACAGAGAG - AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
ATTGCTGGGA - GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
CTGGAAAAGC - CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
GTCGAGGCGC - CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
CCTACCCTGT - AAC------------------TAG
- gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
_DESC_ fertilization protein MATCHES_ON Haliotis
corrugata fertilization protein mRNA, complete
cds - ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
AGTATGCAGA - AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
CGCAATGAAG - ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
CTGGCTTGTT - GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
CGCCATAAAG - AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
AATGTTAAAA - TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
CTGGCGAAAC - TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
TCTTAAAAGA - TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
GTTGCCCTGC
20SNAP - Ds/Dn Calculation tool
- http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
l - Calculates synonymous and non-synonymous
substitution rates for codon-aligned nucleotide
sequences according to Nei and Gojobori (1986)
method. -
- This program will only yield valid results if the
input alignment is codon-aligned
21SNAP - Ds/Dn Calculation tool
- Using the alignment we obtained previously.
- Averages of all pairwise comparisons
- ds 0.3510, dn 0.3535, ds/dn 0.8241
- The positive selection in sperm protein genes
from abalone (genus Haliotis) is assumed to be
the result of species-specific interaction with
egg surface proteins during fertilization
(Swanson and Vacquier 1998).
22Distmat
http//sbcr.bii.a-star.edu.sg/cgi-bin/emboss/menu/
distmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
23Distmat
- Feed the DNA alignment of 18-kDa protein into
distmat. - Calculate separately the distances between the
sequences for codon positions 1 and 2, and for
codon position 3. - Are the results in agreement with those from the
dn/ds analysis?
24Distmat
25http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
Align the following sequences (use the file
dalign_sequences.txt) gtseq1 WKKNADAPKRAMTSFMKAA
Y gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
26Results
- DIALIGN composes alignments from fragments
- Lower-case letters denote residues not belonging
to any of these selected fragments. They are not
considered to be aligned.
27Results
- Numbers below the alignment roughly reflect the
degree of local similarity among the sequences
28Anchored alignment
- Now, let us assume that the user has some expert
knowledge concerning a certain domain that is
present in all the input sequences - The domains marked in red in the three sequences
are thought to be homologous to one another
gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
29- Therefore, the user wants to define this domain
as anchor and align the rest of the sequences
automatically. - To specify a set of anchor points, each anchor
point corresponds to a equal-length segment pair
involving two of the input sequences should be
defined
30- first sequence involved
- second sequence involved
- start of anchor in first sequence
- start of anchor in second sequence
- length of anchor
31Results
- The specified domain is aligned and the remainder
of the sequences is aligned automatically
respecting the constraints given by the anchor
points