Multiple Sequence Alignment - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Multiple Sequence Alignment

Description:

Calculates synonymous and non-synonymous substitution rates for codon-aligned ... Distmat calculates the evolutionary distances between every pair of sequences in ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 32

Provided by: Niv76

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment

1
Multiple Sequence Alignment

ClustalW
TCoffee
Ka, Ks, and Ka/Ks
Anchored alignment

2
ClustalW

http//www.ebi.ac.uk/clustalw/

3
ClustalW
4
ClustalW
Paste your sequences
Run
5
Results
Press Start Jalview for interactive view of the
alignment
6
ClustalW output format
Guide Tree Cladogram
7
Exercise

HomoloGene is a system for automated detection of
homologs among the annotated genes of several
completely sequenced eukaryotic genomes.
Download the FASTA sequences of HomoloGene5276
and align them with ClustalW

8
(No Transcript)
9
Result
10
TCoffee

http//www.tcoffee.org/

Tcoffee computes its alignments by combining a
collection of smaller alignments
11
Main features

Multiple Sequence Alignment
Structure based Multiple Sequence Alignment
Combining the output of several multiple sequence
alignment packages
Combining two (or more) multiple sequence
alignments into a single one
Turning amino acid alignments into CDS nucleotide
alignments

12
Exercise

The 18-kDa protein plays an important role in
fertilization of several abalone species
Build a multiple sequence alignment using the
following sequences

13
Sequences

gtgi604533gbAAC37231.1 fertilization protein
MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
EDMGYPITPPQWTTLLYYNR
ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
GMVRVSRRHTSTAIAKRIVA
MKVADLPCN
gtgi604531gbAAC37233.1 fertilization protein
MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
LVKFKRHWLVGANWKLQKFE
TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
YLIVFRMWIGVLKKNLKRSE
ITKPMQKLLDTKDGELPCPVRKIHG
gtgi604529gbAAC37232.1 fertilization protein
MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
RDMRWNLGPGFVFLLKKVNR
ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
DVFRDVQGFRGPKMTAAMRK
YSSKDPGTFPCKNEKRRG
gtgi604527gbAAC37230.1 fertilization protein
MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
ENMGYPITPPQWTTLLYYNR
QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
EMVRVMRRYKSTAIAKKIVA
MKVADLPCN

14
Choose TCoffee Regular, paste the sequences in
the data box, and press submit
15
(No Transcript)
16
Estimating the rate of evolution

In order to study selection patterns, you will
need to have the corresponding DNA alignment
By using the PROTOGENE (Protein-to-Gene) in
Tcoffee, the amino-acid alignment will be
transformed into the corresponding DNA
alignment. The actual procedure is tBLASTn.

17
(No Transcript)
18
Results
19
In case it takes too long

gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
_DESC_ fertilization protein MATCHES_ON Haliotis
assimilis fertilization protein mRNA, complete
cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
GGAC------
------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
CGCAATGAAG
GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
C---ATTGAG
GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
CAACAGAGAG
AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
ATTGCTGGGA
GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
CTGGAAAAGC
CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
GTCGAGGCGC
CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
CCTACCCTGT
AAC------------------TAG
gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
_DESC_ fertilization protein MATCHES_ON Haliotis
corrugata fertilization protein mRNA, complete
cds
ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
AGTATGCAGA
AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
CGCAATGAAG
ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
CTGGCTTGTT
GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
CGCCATAAAG
AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
AATGTTAAAA
TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
CTGGCGAAAC
TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
TCTTAAAAGA
TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
GTTGCCCTGC

20
SNAP - Ds/Dn Calculation tool

http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
l
Calculates synonymous and non-synonymous
substitution rates for codon-aligned nucleotide
sequences according to Nei and Gojobori (1986)
method.
This program will only yield valid results if the
input alignment is codon-aligned

21
SNAP - Ds/Dn Calculation tool

Using the alignment we obtained previously.
Averages of all pairwise comparisons
ds 0.3510, dn 0.3535, ds/dn 0.8241
The positive selection in sperm protein genes
from abalone (genus Haliotis) is assumed to be
the result of species-specific interaction with
egg surface proteins during fertilization
(Swanson and Vacquier 1998).

22
Distmat
http//sbcr.bii.a-star.edu.sg/cgi-bin/emboss/menu/
distmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
23
Distmat

Feed the DNA alignment of 18-kDa protein into
distmat.
Calculate separately the distances between the
sequences for codon positions 1 and 2, and for
codon position 3.
Are the results in agreement with those from the
dn/ds analysis?

24
Distmat
25
http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
Align the following sequences (use the file
dalign_sequences.txt) gtseq1 WKKNADAPKRAMTSFMKAA
Y gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
26
Results

DIALIGN composes alignments from fragments
Lower-case letters denote residues not belonging
to any of these selected fragments. They are not
considered to be aligned.

27
Results

Numbers below the alignment roughly reflect the
degree of local similarity among the sequences

28
Anchored alignment

Now, let us assume that the user has some expert
knowledge concerning a certain domain that is
present in all the input sequences
The domains marked in red in the three sequences
are thought to be homologous to one another

gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
29

Therefore, the user wants to define this domain
as anchor and align the rest of the sequences
automatically.
To specify a set of anchor points, each anchor
point corresponds to a equal-length segment pair
involving two of the input sequences should be
defined