Multiple Sequence Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Multiple Sequence Alignment

Description:

... Ds/Dn Calculation Tool http ... are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids Distmat http://emboss ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 33
Provided by: Niv76
Learn more at: http://nsmn1.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment


1
Multiple Sequence Alignment
  • ClustalW
  • TCoffee
  • Ka, Ks, and Ka/Ks
  • Anchored alignment

2
ClustalW
  • http//www.ebi.ac.uk/clustalw/

3
ClustalW
Paste your sequences
Multiple sequence Alignment alignment options
Submit
4
Exercise
  • HomoloGene is a system for automated detection of
    homologs among annotated genes of several
    completely sequenced eukaryotic genomes.
  • Download the FASTA sequences of HomoloGene5276
    and align them with ClustalW

5
Download protein sequences
6
Result
Alignment
Guide Tree
7
TCoffee
  • http//tcoffee.crg.cat/

Tcoffee computes its alignments by combining a
collection of smaller alignments
8
Alignment at the DNA level based on an alignment
at the Protein Level
  • The 18-kDa protein plays an important role in
    fertilization of several abalone species
  • Build a multiple sequence alignment using the
    following sequences

9
Sequences
  • gtgi604533gbAAC37231.1 fertilization protein
  • MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEII
    EDMGYPITPPQWTTLLYYNR
  • ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQW
    GMVRVSRRHTSTAIAKRIVA
  • MKVADLPCN
  • gtgi604531gbAAC37233.1 fertilization protein
  • MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAK
    LVKFKRHWLVGANWKLQKFE
  • TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRN
    YLIVFRMWIGVLKKNLKRSE
  • ITKPMQKLLDTKDGELPCPVRKIHG
  • gtgi604529gbAAC37232.1 fertilization protein
  • MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRF
    RDMRWNLGPGFVFLLKKVNR
  • ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYA
    DVFRDVQGFRGPKMTAAMRK
  • YSSKDPGTFPCKNEKRRG
  • gtgi604527gbAAC37230.1 fertilization protein
  • MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRV
    ENMGYPITPPQWTTLLYYNR
  • QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQW
    EMVRVMRRYKSTAIAKKIVA
  • MKVADLPCN

10
Choose TCoffee Regular, paste the sequences in
the data box, and press submit
11
Download formats
Guide tree
12
Codon Alignment
  • In order to study selection patterns, you will
    need to have the corresponding DNA alignment
  • Using the PROTOGENE (Protein-to-Gene) in
    Tcoffee, the amino-acid alignment will be
    transformed into a codon alignment. The actual
    procedure invloves tBLASTn.

13
  • PROTOGENE (in Tcoffee) is time consuming. Please
    submit your email address, and the results will
    be emailed to you.
  • PROTOGENE may return more that one DNA sequence
    for any given Protein sequence. For your homework
    assignment, please choose one sequence for each
    species.

14
(Result) Codon alignment
  • gtgi604533gbAAC37231.1_G_L36554 _S_ AAC37231
    _DESC_ fertilization protein MATCHES_ON Haliotis
    assimilis fertilization protein mRNA, complete
    cds
  • ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGC
    GGAC------
  • ------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGC
    CGCAATGAAG
  • GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAAT
    C---ATTGAG
  • GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTA
    CAACAGAGAG
  • AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTAT
    ATTGCTGGGA
  • GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGG
    CTGGAAAAGC
  • CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GT
    GTCGAGGCGC
  • CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGA
    CCTACCCTGT
  • AAC------------------TAG
  • gtgi604531gbAAC37233.1_G_L36590 _S_ AAC37233
    _DESC_ fertilization protein MATCHES_ON Haliotis
    corrugata fertilization protein mRNA, complete
    cds
  • ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGC
    AGTATGCAGA
  • AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGC
    CGCAATGAAG
  • ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCA
    CTGGCTTGTT
  • GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCT
    CGCCATAAAG
  • AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAAT
    AATGTTAAAA
  • TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGC
    CTGGCGAAAC
  • TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAA
    TCTTAAAAGA
  • TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGA
    GTTGCCCTGC

15
SNAP - Ds/Dn Calculation Tool
  • http//hcv.lanl.gov/content/sequence/SNAP/SNAP.htm
    l
  • Calculates synonymous and nonsynonymous
    substitution rates based on codon alignments
    according to Nei and Gojobori (1986) method.

16
Input codon alignment
Select output statistics
17
SNAP - Ds/Dn Calculation Tool
  • Conclusion We detect positive selection in six
    of the comparisons. So did Swanson and Vacquier
    (1998).

18
Distmat
http//emboss.bioinformatics.nl/cgi-bin/emboss/dis
tmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment. The distances are expressed in terms
of the number per 100 nucleotides or number of
replacements per 100 amino acids
19
Distmat
  • Feed the DNA alignment of 18-kDa protein into
    distmat.
  • Calculate separately the distances between the
    sequences for codon positions 1 and 2, and for
    codon position 3.
  • Are the results in agreement with those from the
    dn/ds analysis?

20
Distmat
21
Distmat
22
Anchored multiple-sequence alignment with DIALIGN
http//dialign.gobics.de/anchor/submission.php
User manual
http//dialign.gobics.de/anchor/manual
23
  • Align the following sequences (use the file
    dalign_sequences.txt)
  • gtseq1 WKKNADAPKRAMTSFMKAAY
  • gtseq2 WNLDTNSPEEKQAYIQLAKDDRIRYD
  • gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK


24
Results
  • DIALIGN makes alignments from fragments

25
Results
  • Numbers below the alignment reflect some rough
    degree of local similarity among the sequences

26
Anchored alignment
  • Now, let us assume that the user has some expert
    knowledge concerning a certain domain that is
    present in all the input sequences
  • The domains marked in red in the three sequences
    are thought to be homologous to one another

gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2
WNLDTNSPEEKQAYIQLAKDDRIRYD gtseq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
27
  • Therefore, the user wants to define this domain
    as anchor and align the rest of the sequences
    automatically.
  • To specify a set of anchor points, each anchor
    point corresponds to a equal-length segment pair
    involving two of the input sequences should be
    defined

28
  • first sequence involved
  • second sequence involved
  • start of anchor in first sequence
  • start of anchor in second sequence
  • length of anchor

29
Results
  • The specified domain is aligned and the remainder
    of the sequences is aligned automatically
    respecting the constraints given by the anchor
    points

30
Guidance/HoT
31
gtseq1 WKKNADAPKRAMTSFMKAAY gtseq2 WNLDTNSPEEKQAYI
QLAKDDRIRYD gtseq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK gts
eq4 WRMDSNQKNPNNPKAAYNKGDANAPK
32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com