Multiple Sequence Alignment - PowerPoint PPT Presentation

About This Presentation

Title:

Multiple Sequence Alignment

Description:

... Ds/Dn Calculation Tool http ... are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids Distmat http://emboss ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 33

Provided by: Niv76

Learn more at: http://nsmn1.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment

1
Multiple Sequence Alignment

ClustalW
TCoffee
Ka, Ks, and Ka/Ks
Anchored alignment

2
ClustalW

http//www.ebi.ac.uk/clustalw/

3
ClustalW
Paste your sequences
Multiple sequence Alignment alignment options
Submit
4
Exercise

HomoloGene is a system for automated detection of
homologs among annotated genes of several
completely sequenced eukaryotic genomes.
Download the FASTA sequences of HomoloGene5276
and align them with ClustalW

5
Download protein sequences
6
Result
Alignment
Guide Tree
7
TCoffee

http//tcoffee.crg.cat/

Tcoffee computes its alignments by combining a
collection of smaller alignments
8
Alignment at the DNA level based on an alignment
at the Protein Level

The 18-kDa protein plays an important role in
fertilization of several abalone species
Build a multiple sequence alignment using the
following sequences

9
Sequences

gtgi604533gbAAC37231.1 fertilization protein
MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
EDMGYPITPPQWTTLLYYNR
ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
GMVRVSRRHTSTAIAKRIVA
MKVADLPCN
gtgi604531gbAAC37233.1 fertilization protein
MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
LVKFKRHWLVGANWKLQKFE
TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
YLIVFRMWIGVLKKNLKRSE
ITKPMQKLLDTKDGELPCPVRKIHG
gtgi604529gbAAC37232.1 fertilization protein
MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
RDMRWNLGPGFVFLLKKVNR
ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
DVFRDVQGFRGPKMTAAMRK
YSSKDPGTFPCKNEKRRG
gtgi604527gbAAC37230.1 fertilization protein
MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
ENMGYPITPPQWTTLLYYNR
QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
EMVRVMRRYKSTAIAKKIVA
MKVADLPCN

10
Choose TCoffee Regular, paste the sequences in
the data box, and press submit
11
Download formats
Guide tree
12
Codon Alignment

In order to study selection patterns, you will
need to have the corresponding DNA alignment
Using the PROTOGENE (Protein-to-Gene) in
Tcoffee, the amino-acid alignment will be
transformed into a codon alignment. The actual
procedure invloves tBLASTn.

PROTOGENE (in Tcoffee) is time consuming. Please
submit your email address, and the results will
be emailed to you.
PROTOGENE may return more that one DNA sequence
for any given Protein sequence. For your homework
assignment, please choose one sequence for each
species.

14
(Result) Codon alignment

gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
_DESC_ fertilization protein MATCHES_ON Haliotis
assimilis fertilization protein mRNA, complete
cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
GGAC------
------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
CGCAATGAAG
GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
C---ATTGAG
GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
CAACAGAGAG
AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
ATTGCTGGGA
GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
CTGGAAAAGC
CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
GTCGAGGCGC
CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
CCTACCCTGT
AAC------------------TAG
gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
_DESC_ fertilization protein MATCHES_ON Haliotis
corrugata fertilization protein mRNA, complete
cds
ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
AGTATGCAGA
AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
CGCAATGAAG
ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
CTGGCTTGTT
GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
CGCCATAAAG
AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
AATGTTAAAA
TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
CTGGCGAAAC
TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
TCTTAAAAGA
TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
GTTGCCCTGC

15
SNAP - Ds/Dn Calculation Tool

http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
l
Calculates synonymous and nonsynonymous
substitution rates based on codon alignments
according to Nei and Gojobori (1986) method.

16
Input codon alignment
Select output statistics
17
SNAP - Ds/Dn Calculation Tool

Conclusion We detect positive selection in six
of the comparisons. So did Swanson and Vacquier
(1998).

18
Distmat
http//emboss.bioinformatics.nl/cgi-bin/emboss/dis
tmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
19
Distmat

Feed the DNA alignment of 18-kDa protein into
distmat.
Calculate separately the distances between the
sequences for codon positions 1 and 2, and for
codon position 3.
Are the results in agreement with those from the
dn/ds analysis?

20
Distmat
21
Distmat
22
Anchored multiple-sequence alignment with DIALIGN
http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
23

Align the following sequences (use the file
dalign_sequences.txt)
gtseq1 WKKNADAPKRAMTSFMKAAY
gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD
gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK

24
Results

DIALIGN makes alignments from fragments

25
Results

Numbers below the alignment reflect some rough
degree of local similarity among the sequences

26
Anchored alignment

Now, let us assume that the user has some expert
knowledge concerning a certain domain that is
present in all the input sequences
The domains marked in red in the three sequences
are thought to be homologous to one another

gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
27

Therefore, the user wants to define this domain
as anchor and align the rest of the sequences
automatically.
To specify a set of anchor points, each anchor
point corresponds to a equal-length segment pair
involving two of the input sequences should be
defined

first sequence involved
second sequence involved
start of anchor in first sequence
start of anchor in second sequence
length of anchor

29
Results

The specified domain is aligned and the remainder
of the sequences is aligned automatically
respecting the constraints given by the anchor
points

30
Guidance/HoT
31
gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2 WNLDTNSPEEKQAYI
QLAKDDRIRYD gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK gts
eq4 WRMDSNQKNPNNPKAAYNKGDANAPK
32
(No Transcript)

Write a Comment

User Comments (0)