Multiple Sequence Alignment - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Multiple Sequence Alignment

Description:

Align sequences, starting at the leaves of the guide tree. ... Searching a database for new members of your protein family (pfsearch) ... – PowerPoint PPT presentation

Number of Views:141

Avg rating:3.0/5.0

Slides: 44

Provided by: ssi9

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment

1
Multiple Sequence Alignment
2
Definition

Homology related by descent
Homologous sequence positions

? ATTGCGC
ATTGCGC
ATTGCGC
?
AT-CCGC
ATTGCGC
? ATCCGC
C
3
Reasons for aligning sets of sequences

Organise data to reflect sequence homology
Estimate evolutionary distance
Infer phylogenetic trees from homologous sites
Highlight conserved sites/regions
Highlight variable sites/regions
Uncover changes in gene structure
Look for evidence of selection
Summarise information

4
Alignments help to
Organise
Visualise
Analyze
Sequence Data
5

The process of aligning sequences is a game
involving playing off gaps and mismatches

6
Ways of aligning multiple sequences

By hand
Automated
Combination

7
Definition

Optimality criteria some kind rule or scoring
scheme to help you to decide what you consider to
be the best alignment

8
Pairwise vs Multiple Sequences

Pairs of sequences typically aligned using
exhaustive algorithms (dynamic programming)
complexity of exhaustive methods is O(2n mn) n
number of sequences m sequence length
Multiple sequence alignment uses heuristic methods

9
The Correct Alignment
? ATTGCGC
ATTGCGC
ATTGCGC
? ATCCGC
C
10
The Correct Alignment
11

Sequence alignment is easy with sufficiently
closely related sequences
Below a certain level of identity sequence
alignment may become meaningless
twilight zone for aa sequences 30
In the twilight zone it is good to make use of
additional information if possible (e.g.
structure)

12
Consensus Sequences

Simplest FormA single sequence which represents
the most common amino acid/base in that position
Y D D G A V - E A L
Y D G G - - - E A L
F E G G I L V E A L
F D - G I L V Q A V
Y E G G A V V Q A L
Y D G G A/I V/L V E A L

13
Multiple Alignment Formats

e.g. Clustal, Phylip, MSF, MEGA etc. etc.

14
Clustal Format

CLUSTAL X (1.81) multiple sequence alignment
CAS1_BOVIN MKLLILTCLVAVALARPKHPIKHQGLPQ------
--EVLNEN-
CAS1_SHEEP MKLLILTCLVAVALARPKHPIKHQGLSP------
--EVLNEN-
CAS1_PIG MKLLIFICLAAVALARPKPPLRHQEHLQNEPDSR
E--------
CAS1_HUMAN MRLLILTCLVAVALARPKLPLRYPERLQNPSESS
E--------
CAS1_RABBIT MKLLILTCLVATALARHKFHLGHLKLTQEQPESS
EQEILKERK
CAS1_MOUSE MKLLILTCLVAAAFAMPRLHSRNAVSSQTQ----
--QQHSSSE
CAS1_RAT MKLLILTCLVAAALALPRAHRRNAVSSQTQ----
---------
.. .

15
Phylip Format (Interleaved)

7 100
SOMA_BOVIN MMAAGPRTSL LLAFALLCLP WTQVVGAFPA
MSLSGLFANA VLRAQHLHQL
SOMA_SHEEP MMAAGPRTSL LLAFTLLCLP WTQVVGAFPA
MSLSGLFANA VLRAQHLHQL
SOMA_RAT_P -MAADSQTPW LLTFSLLCLL WPQEAGAFPA
MPLSSLFANA VLRAQHLHQL
SOMA_MOUSE -MATDSRTSW LLTVSLLCLL WPQEASAFPA
MPLSSLFSNA VLRAQHLHQL
SOMA_RABIT -MAAGSWTAG LLAFALLCLP WPQEASAFPA
MPLSSLFANA VLRAQHLHQL
SOMA_PIG_P -MAAGPRTSA LLAFALLCLP WTREVGAFPA
MPLSSLFANA VLRAQHLHQL
SOMA_HUMAN -MATGSRTSL LLAFGLLCLP WLQEGSAFPT
IPLSRLFDNA MLRAHRLHQL
AADTFKEFER TYIPEGQRYS -IQNTQVAFC
FSETIPAPTG KNEAQQKSDL
AADTFKEFER TYIPEGQRYS -IQNTQVAFC
FSETIPAPTG KNEAQQKSDL
AADTYKEFER AYIPEGQRYS -IQNAQAAFC
FSETIPAPTG KEEAQQRTDM
AADTYKEFER AYIPEGQRYS -IQNAQAAFC
FSETIPAPTG KEEAQQRTDM
AADTYKEFER AYIPEGQRYS -IQNAQAAFC
FSETIPAPTG KDEAQQRSDM
AADTYKEFER AYIPEGQRYS -IQNAQAAFC
FSETIPAPTG KDEAQQRSDV
AFDTYQEFEE AYIPKEQKYS FLQNPQTSLC
FSESIPTPSN REETQQKSNL

16
Phylip Format (Sequential)

3 100
Rat
ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTTAATGGCCG
TGGTGGCTGGAGTGGCCAGTGCCCTGGCTCACAAGTACCACTAA
Mouse
ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTCTCTTGCCT
TGGGGAAAGGTGAACTCCGATGAAGTTGGTGGTGAGGCCCTGGG
Rabbit
ATGGTGCATCTGTCCAGT---GAGGAGAAGTCTGCGGTCACTGC
TGGGGCAAGGTGAATGTGGAAGAAGTTGGTGGTGAGGCCCTGGG

17
Mega Format

mega
TITLE No title
Rat ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT
Mouse ATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGT
Rabbit ATGGTGCATCTGTCCAGT---GAGGAGAAGTCTGC
Human ATGGTGCACCTGACTCCT---GAGGAGAAGTCTGC
Oppossum ATGGTGCACTTGACTTTT---GAGGAGAAGAACTG
Chicken ATGGTGCACTGGACTGCT---GAGGAGAAGCAGCT
Frog ---ATGGGTTTGACAGCACATGATCGT---CAGCT

18
Progressive Multiple Alignment

Heuristic
Perform pairwise alignments
Align sequences to alignments or alignments to
existing alignments (profile alignments
Do the alignments in some sensible order

19
Iterative methods

Several progressive alignment methods can be
iterated
e.g. Barton-Sternberg, ClustalX

20
ClustalX Algorithm

Perform pairwise alignments and calculate
distances for all pairs of sequences
Construct guide tree (dendrogram) joining the
most similar sequences using Neighbour Joining
Align sequences, starting at the leaves of the
guide tree. This involves the pair-wise
comparisons as well as comparison of single
sequence with a group of seqs (Profile)

ClustalX is not optimal
There are known areas in which ClustalX performs
badly e.g.
errors introduced early cannot be corrected by
subsequent information
alignments of sequences of differing lengths
cause strange guide trees and unpredictable
effects
edges ClustalX does not penalise gaps at edges
There are alternatives to ClustalX available

22
Using ClustalX

Start with sequences in FASTA format (or an
existing alignment in Clustal format
Do Alignment on the alignment menu

23
(No Transcript)
24
ClustalX Parameters

Scoring Matrix
Gap opening penalty
Gap extension penalty
Protein gap parameters
Additional algorithm parameters
Secondary structure penalties

25
Score Matrices

Pairwise matrices and multiple alignment matrix
series
PAM (Dayhoff), BLOSUM (Hennikof), GONNET
(default), user defined
Transition (Alt-gtG)/Transversion (Clt-T) ratio
low for distantly related sequences

26
Gap Penalties

Linear gap penalties Affine gap penalties
p (o l.e)
Gap opening
Gap extension
Protein specific penalties (on by default)
Increase the probability of gaps associated with
certain residues
Increase the chances of gaps in loop regions (gt 5
hydrophilic residues)

27
Algorithm parameters

Slow-accurate pair-wise alignment
Do alignment from guide tree
Reset gaps before aligning (iteration)
Delay Divergent sequences ()

28
Additional displays

Column Scores
Low quality regions
Exceptional residues

29
Multiple Alignment Strategies

Align pairs of sequences using an optimal method
Choose representative sequences to align
carefully
Choose sequences of comparable lengths
Progressive alignment programs such as ClustalX
for multiple alignment
Progressive alignment programs may be combined
Review alignment by eye and edit

30
Alignment of coding regions

Nucleotide sequences much harder to align
accurately than proteins
Protein coding sequences can be aligned using the
protein sequences
e.g. BioEdit toggle translation to amino acid,
call clustalw to align, edit alignment by hand,
toggle back to nucleotide
In-frame nucleotide alignments can be used, e.g.
to determine non-synonymous and synonymous
distances separately

31
Multiple Alignments and Phylogenetic Trees

You can make a more accurate multiple sequence
alignment if you know the tree already
A good multiple sequence alignment is an
important starting point for drawing a tree
The process of constructing a multiple alignment
(unlike pair-wise) needs to take account of
phylogenetic relationships

32
Editing a multiple sequence alignment

It is NOT fraud to edit a multiple sequence
alignment
Incorporate additional knowledge if possible
Alignment edititors help to keep the data
organised and help to prevent unwanted mistakes

33
Alignment Editors

e.g. GDE, Bioedit, Seaview, Jalview etc.
Alignment editors can function as an
organisational tool (analyses tools on BioEdit)
Construct sub-sequences (GDE, Seaview)
Annotate sequences (Seaview)

34
Aligning weakly similar sequences
35
Sequence contains conserved regions

e.g. DIALIGN (Morgenstern, Dress, Werner)
re-aligns regions between conserved blocks
http//bibiserv.techfak.uni-bielefeld.de/
useful if sequences contains consistent conserved
blocks
Block Maker searches for conserved words that
may be inconsistent http//blocks.fhcrc.org/

36
Profile Alignment

Gribskov et al. 1987
Position specific scores
Allows alignment of alignments
Gaps introduced as whole columns in the separate
alignments
Optimal alignment in time O(a2l2)
a alphabet size, l sequence length
Information about the degree of conservation of
sequence positions is included

37
Good reasons to use profile alignments

Adding a new sequence to an existing multiple
alignment that you want to keep the same(align
sequence to profile)
Searching a database for new members of your
protein family(pfsearch)
Searching a database of profiles to find out
which one your sequence belongs to(pfscan)
Combining two multiple sequence
alignments(profile to profile)

38
Profile Alignment Using ClustalX

Profile Alignment Mode
Align sequence to profile
Align profile 1 to profile 2
Secondary structure parameters

39
(No Transcript)
40
Profile searching using PSI-BLAST

Position Specific Iterative
Perform search construct profile perform
search
Convergence (hopefully)
Increased sensitivity for distantly related
sequences
Available on-line (NCBI)

41
Databases of Aligned Sequences

Hovergen http//pbil.univ-lyon1.fr/databases/hover
gen.html (vertebrate alignments)
Pfam http//www.sanger.ac.uk/Software/Pfam/
(protein domain alignments and profile HMMs)
BLOCKS http//blocks.fhcrc.org/
Ribosomal Database Project http//rdp.cme.msu.edu/
html/ alignments and trees derived from rRNA
sequences
Interpro combines information from other
sources
Many more

42
Probabilistic Models of Sequence Alignment

Hidden Markov Models
sequence of states and associated symbol
probabilities
Produces a probabilistic model of a sequence
alignment
Align a sequence to a Profile Hidden Markov Model
Algorithms exist to find the most efficient
pathway through the model

Markov Chain A chain of things. The
probability of the next thing depends only on the
current thing
Hidden Markov Model A sequence of states which
form a Markov Chain. The states are not
observable. The observable characters have
emission probabilities which depend on the
current state.

Write a Comment

User Comments (0)