Title: Multiple Sequence Alignment
1Multiple Sequence Alignment
- ClustalW
- TCoffee
- Ka, Ks, and Ka/Ks
- Anchored alignment
2ClustalW
- http//www.ebi.ac.uk/clustalw/
3ClustalW
Paste your sequences
Multiple sequence Alignment alignment options
Submit
4Exercise
- HomoloGene is a system for automated detection of
homologs among annotated genes of several
completely sequenced eukaryotic genomes. - Download the FASTA sequences of HomoloGene5276
and align them with ClustalW
5Download protein sequences
6Result
Alignment
Guide Tree
7TCoffee
Tcoffee computes its alignments by combining a
collection of smaller alignments
8Alignment at the DNA level based on an alignment
at the Protein Level
- The 18-kDa protein plays an important role in
fertilization of several abalone species - Build a multiple sequence alignment using the
following sequences
9Sequences
- gtgi604533gbAAC37231.1 fertilization protein
- MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
EDMGYPITPPQWTTLLYYNR - ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
GMVRVSRRHTSTAIAKRIVA - MKVADLPCN
- gtgi604531gbAAC37233.1 fertilization protein
- MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
LVKFKRHWLVGANWKLQKFE - TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
YLIVFRMWIGVLKKNLKRSE - ITKPMQKLLDTKDGELPCPVRKIHG
- gtgi604529gbAAC37232.1 fertilization protein
- MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
RDMRWNLGPGFVFLLKKVNR - ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
DVFRDVQGFRGPKMTAAMRK - YSSKDPGTFPCKNEKRRG
- gtgi604527gbAAC37230.1 fertilization protein
- MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
ENMGYPITPPQWTTLLYYNR - QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
EMVRVMRRYKSTAIAKKIVA - MKVADLPCN
10Choose TCoffee Regular, paste the sequences in
the data box, and press submit
11Download formats
Guide tree
12Codon Alignment
- In order to study selection patterns, you will
need to have the corresponding DNA alignment - Using the PROTOGENE (Protein-to-Gene) in
Tcoffee, the amino-acid alignment will be
transformed into a codon alignment. The actual
procedure invloves tBLASTn.
13- PROTOGENE (in Tcoffee) is time consuming. Please
submit your email address, and the results will
be emailed to you. - PROTOGENE may return more that one DNA sequence
for any given Protein sequence. For your homework
assignment, please choose one sequence for each
species.
14(Result) Codon alignment
- gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
_DESC_ fertilization protein MATCHES_ON Haliotis
assimilis fertilization protein mRNA, complete
cds - ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
GGAC------ - ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
CGCAATGAAG - GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
C---ATTGAG - GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
CAACAGAGAG - AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
ATTGCTGGGA - GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
CTGGAAAAGC - CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
GTCGAGGCGC - CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
CCTACCCTGT - AAC------------------TAG
- gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
_DESC_ fertilization protein MATCHES_ON Haliotis
corrugata fertilization protein mRNA, complete
cds - ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
AGTATGCAGA - AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
CGCAATGAAG - ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
CTGGCTTGTT - GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
CGCCATAAAG - AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
AATGTTAAAA - TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
CTGGCGAAAC - TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
TCTTAAAAGA - TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
GTTGCCCTGC
15SNAP - Ds/Dn Calculation Tool
- http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
l - Calculates synonymous and nonsynonymous
substitution rates based on codon alignments
according to Nei and Gojobori (1986) method. -
16Input codon alignment
Select output statistics
17SNAP - Ds/Dn Calculation Tool
- Conclusion We detect positive selection in six
of the comparisons. So did Swanson and Vacquier
(1998).
18Distmat
http//emboss.bioinformatics.nl/cgi-bin/emboss/dis
tmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
19Distmat
- Feed the DNA alignment of 18-kDa protein into
distmat. - Calculate separately the distances between the
sequences for codon positions 1 and 2, and for
codon position 3. - Are the results in agreement with those from the
dn/ds analysis?
20Distmat
21Distmat
22Anchored multiple-sequence alignment with DIALIGN
http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
23- Align the following sequences (use the file
dalign_sequences.txt) - gtseq1 WKKNADAPKRAMTSFMKAAY
- gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD
- gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
-
-
24Results
- DIALIGN makes alignments from fragments
25Results
- Numbers below the alignment reflect some rough
degree of local similarity among the sequences
26Anchored alignment
- Now, let us assume that the user has some expert
knowledge concerning a certain domain that is
present in all the input sequences - The domains marked in red in the three sequences
are thought to be homologous to one another
gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
27- Therefore, the user wants to define this domain
as anchor and align the rest of the sequences
automatically. - To specify a set of anchor points, each anchor
point corresponds to a equal-length segment pair
involving two of the input sequences should be
defined
28- first sequence involved
- second sequence involved
- start of anchor in first sequence
- start of anchor in second sequence
- length of anchor
29Results
- The specified domain is aligned and the remainder
of the sequences is aligned automatically
respecting the constraints given by the anchor
points
30Guidance/HoT
31gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2 WNLDTNSPEEKQAYI
QLAKDDRIRYD gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK gts
eq4 WRMDSNQKNPNNPKAAYNKGDANAPK
32(No Transcript)