Practical Aspects of Multiple Sequence Alignments - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Practical Aspects of Multiple Sequence Alignments

Description:

NFL/-S. N-YLS. NKYLS. N-F-S. N-FLS. Alignment. 10/1/2002. 7 ... of all input sequences, then ranks scores of identities among pairs of sequences. ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 29
Provided by: michael325
Category:

less

Transcript and Presenter's Notes

Title: Practical Aspects of Multiple Sequence Alignments


1
Practical Aspects of Multiple Sequence Alignments
  • Mike Thomas, Ph.D.
  • Bioinformatics Research Center, Medical College
    of Wisconsin

2
Outline Practical aspects of MSE
  • Why it is important to accurately assess
    alignments
  • How do conduct MSEs
  • What we do with MSEs

3
Why do we do multiple sequence alignments?
  • Infer phylogenetic relationships
  • Understand evolutionary pressures acting on a
    gene
  • Formulate test hypotheses about protein 3-D
    structure (based on conserved regions)
  • Formulate test hypotheses about protein
    function
  • Understand how protein function has changed
  • Identify primers and probes to search for
    homologous sequences in other organisms

4
The relationship of MSA to phylogenetics
  • The goal of phylogenetics is to reconstruct
    evolutionary history using share, derived
    characters
  • Characters that have a common evolutionary
    history (are homologous)
  • For example, eyes of humans and rats (but not
    humans and octopi)
  • Traditionally, morphological characters were
    used.
  • Now, DNA and amino acid sequence alignments are
    very common for phylogenetic reconstruction
  • It is assumed that properly aligned sequences
    represents homology

5
The relationship between MSA and evolutionary
history of a group of genes or organisms
NFS
NFLS
NYLS
NKYLS
-L
K
NYLS
6
Using known evolutionary relationship for
sequence alignment
NFS
NFLS
NYLS
NKYLS
NFL/-S
NK/-YLS
NK/-Y/FL/-S
7
What happens when a sequence alignment is wrong?
A
B
C
A
C
B
B
C
A
A AGT B AT C ATC
A AGT- B A-T- C A-TC
A AGT B AT- C ATC
A AGT- B A-T- C A-TC
III
II
I
Unaligned
8
Parameter considerations consequences
transitions, transversions, and gaps
4 possible alignments of AATCGCG AACCCGG
Gaps, tv, ts
A.
AATCGCG AACCCGG
0, 2, 1
B.
AATCGCG- AACC-CGG
2, 0, 1
C.
AATCGCG- AA-CCCGG
2, 1, 0
  • transition rate
  • transversion rate
  • These are treated the same for long divergence
    times.

D.
AATCGC-G- AA-C-CCGG
4, 0, 0
9
Parameter considerations consequences
transitions, transversions, and gaps
4 possible alignments of AATCGCG AACCCGG
Indels, tvs, tss
AATCGCG AACCCGG
A.
0, 2, 1
AATCGCG- AACC-CGG
B.
2, 0, 1
AATCGCG- AA-CCCGG
C.
2, 1, 0
AATCGC-G- AA-C-CCGG
D.
4, 0, 0
10
Tools for MSE
  • Clustal web server or run locally
  • Web server http//www.ebi.ac.uk/clustalw/index.ht
    ml
  • Manuscript with details http//www.csc.fi/molbio/
    progs/clustalw/ms.html
  • Goal Find an optimal multiple alignment

11
ClustalW
  • CLUSTAL Has number of variations, the most
    commonly used is CLUSTALW
  • Generates pairwise alignments of all input
    sequences, then ranks scores of identities among
    pairs of sequences.
  • High scoring pairs of sequences align most
    readily to each other.
  • More divergent (less related) pairs are then
    added to the alignment.
  • Generates a phylogenetic tree of relationships to
    determine steps in constructing the alignment.
  • One can view the phylogenetic tree used to
    generate the alignment.
  • Individual pairs in the alignment are aligned
    using a FASTA-type (word-based, fast alignment)
    or by a dynamic programming algorithm, which is
    slower, but produces optimal pairwise alignments.

12
Unix version of ClustalX, the graphical interface
to ClustalW, run locally. Note colors for amino
acid qualities and score indicator.
13
Web ClustalW options
14
FOSB_MOUSE Protein fosB MFQAFPGDYD SGSRCSSSPS
AESQYLSSVD SFGSPPTAAA SQECAGLGEM PGSFVPTVTA
ITTSQDLQWL VQPTLISSMA QSQGQPLASQ PPAVDPYDMP
GTSYSTPGLS AYSTGGASGS GGPSTSTTTS GPVSARPARA
RPRRPREETL TPEEEEKRRV RRERNKLAAA KCRNRRRELT
DRLQAETDQL EEEKAELESE IAELQKEKER LEFVLVAHKP
GCKIPYEEGP GPGPLAEVRD LPGSTSAKED GFGWLLPPPP
PPPLPFQSSR DAPPNLTASL FTHSEVQVLG DPFPVVSPSY
TSSFVLTCPE VSAFAGAQRT SGSEQPSDPL
NSPSLLAL FOSB_HUMAN Protein fosB MFQAFPGDYD
SGSRCSSSPS AESQYLSSVD SFGSPPTAAA SQECAGLGEM
PGSFVPTVTA ITTSQDLQWL VQPTLISSMA QSQGQPLASQ
PPVVDPYDMP GTSYSTPGMS GYSSGGASGS GGPSTSGTTS
GPGPARPARA RPRRPREETL TPEEEEKRRV RRERNKLAAA
KCRNRRRELT DRLQAETDQL EEEKAELESE IAELQKEKER
LEFVLVAHKP GCKIPYEEGP GPGPLAEVRD LPGSAPAKED
GFSWLLPPPP PPPLPFQTSQ DAPPNLTASL FTHSEVQVLG
DPFPVVNPSY TSSFVLTCPE VSAFAGAQRT SGSDQPSDPL
NSPSLLAL FOS_CHICK Proto-oncogene protein
c-fos MMYQGFAGEY EAPSSRCSSA SPAGDSLTYY PSPADSFSSM
GSPVNSQDFC TDLAVSSANF VPTVTAISTS PDLQWLVQPT
LISSVAPSQN RGHPYGVPAP APPAAYSRPA VLKAPGGRGQ
SIGRRGKVEQ LSPEEEEKRR IRRERNKMAA AKCRNRRREL
TDTLQAETDQ LEEEKSALQA EIANLLKEKE KLEFILAAHR
PACKMPEELR FSEELAAATA LDLGAPSPAA AEEAFALPLM
TEAPPAVPPK EPSGSGLELK AEPFDELLFS AGPREASRSV
PDMDLPGASS FYASDWEPLG AGSGGELEPL CTPVVTCTPC
PSTYTSTFVF TYPEADAFPS CAAAHRKGSS
SNEPSSDSLS FOS_RAT Proto-oncogene protein
c-fos MMFSGFNADY EASSSRCSSA SPAGDSLSYY HSPADSFSSM
GSPVNTQDFC ADLSVSSANF IPTVTAISTS PDLQWLVQPT
LVSSVAPSQT RAPHPYGLPT PSTGAYARAG VVKTMSGGRA
QSIGRRGKVE QLSPEEEEKR RIRRERNKMA AAKCRNRRRE
LTDTLQAETD QLEDEKSALQ TEIANLLKEK EKLEFILAAH
RPACKIPNDL GFPEEMSVTS LDLTGGLPEA TTPESEEAFT
LPLLNDPEPK PSLEPVKNIS NMELKAEPFD DFLFPASSRP
SGSETARSVP DVDLSGSFYA ADWEPLHSSS LGMGPMVTEL
EPLCTPVVTC TPSCTTYTSS FVFTYPEADS FPSCAAAHRK
GSSSNEPSSD SLSSPTLLAL FOS_MOUSE Proto-oncogene
protein c-fos MMFSGFNADY EASSSRCSSA SPAGDSLSYY
HSPADSFSSM GSPVNTQDFC ADLSVSSANF IPTVTAISTS
PDLQWLVQPT LVSSVAPSQT RAPHPYGLPT QSAGAYARAG
MVKTVSGGRA QSIGRRGKVE QLSPEEEEKR RIRRERNKMA
AAKCRNRRRE LTDTLQAETD QLEDEKSALQ TEIANLLKEK
EKLEFILAAH RPACKIPDDL GFPEEMSVAS LDLTGGLPEA
STPESEEAFT LPLLNDPEPK PSLEPVKSIS NVELKAEPFD
DFLFPASSRP SGSETSRSVP DVDLSGSFYA ADWEPLHSNS
LGMGPMVTEL EPLCTPVVTC TPGCTTYTSS FVFTYPEADS
FPSCAAAHRK GSSSNEPSSD SLSSPTLLAL
Sequence data for two related genes fosB from
mouse and human c-fos from chicken, mouse, and
rat.
15
  • Significant differences between FosB and C-Fos.
  • Rat and mouse C-Fos sequences differ from chicken
    C-Fos.
  • Long conserved region between130 and 225.
  • Symbols
  • Identity across all sequences
  • Conservation of amino acid characteristics
  • . Semi-conserved substitutions

16
  • To better visualize conservation, colors can be
    used.
  • Color code
  • AVFPMILW RED, Small (small hydrophobic
    (incl.aromatic -Y))
  • DE BLUE, Acidic
  • RHK MAGENTA, Basic
  • STYHCNGQ GREEN, Hydroxyl Amine Basic Q
  • Others Grey
  • Differences between the genes and species are
    more apparent.

17
Neighbor Joining tree constructed with a
web-ClustalW applet (Jalview) FosB c-Fos can be
distinguished Rat mouse cluster apart from
chicken, with respect to c-Fos
18
A Highly conserved region
B Rather dissimilar region
19
Threonyl-tRNA synthetase (thrS2) gene w/
consensus sequence
20
Threonyl-tRNA synthetase (thrS2) gene in 6 species
21
MALIGN
  • construction of pairwise MOTIFS (conserved
    regions of similarity without gaps)
  • construction of MULTIPLE MOTIFS (of thickness
    exceeding 2)
  • forming of SUPERMOTIFS (groupings of motifs that
    near each other) from MULTIPLE MOTIFS
  • construction of MULTIPLE ALIGNMENTS from
    previously obtained MOTIFS and SUPERMOTIFS and
    consequent selection of the best alignment.
  • http//www.genebee.msu.su/services/malign_full.htm
    l

22
(No Transcript)
23
Each motif supermotif has a score and involves
a pair of sequences
24
Malign
Possibility of receiving a sum of mismatch
weights along an alignment for random sequences
of the same length includes parameters for gaps
mismatches
25
The relationship of MSA to phylogenetics
  • The goal of phylogenetics is to reconstruct
    evolutionary history using share, derived
    characters
  • Characters that have a common evolutionary
    history (are homologous)
  • For example, eyes of humans and rats (but not
    humans and octopi)
  • Traditionally, morphological characters were
    used.
  • Now, DNA and amino acid sequence alignments are
    very common for phylogenetic reconstruction
  • It is assumed that properly aligned sequences
    represents homology

26
The relationship of MSA to phylogenetics
AHFGEPDFTV WNAGQFPANL HTQ-DMSSKS TIEINFKAME
MIILGTEYAG ENFGEPDFTV WNAGQFPANT HTS-GMTSKT
TVEINFKQME MVILGTEYAG KNFGEPDFTI YNAGQFPANI
HTK-GMTSAT SVEINFKDME MVILGTEYAG EDFGTPDFTI
YNAGQFPCNR YTH-YMTSST SIDLNLARRE MVIMGTQYAG
ESFGTPDFTI YNAGQFPCNR YTH-YMTSST SVDLNLARRE
MVILGTQYAG LVGFKPDFVV MNGSKVTNPN WKEQGLNSEN
FVAFNLTEGV QLIGGTWYGG LKNFEPDFVV MNGSKVTNPN
WKEQGLNSEN FVAFNLTERI QLIGGTWYGG LAHFKPDFVV
MNGAKCTNAK WKEHGLNSEN FTVFNLTERM QLIGGTWYGG
LKGFEPDFVV LNASKAKVEN FKELGLNSET AVVFNLAEKM
QIILNTWYGG LANFKPDFVV YNASKAKVEN YKELGLHSET
AVVFNLTSRE QVIINTWYGG LENFKADFIV YNACKCINED
YKQDGLNSEV FVIFNVEENI AVIGGTWYGG ATKIKPNFTI
VSAPHFKADP EVD-GTKSET FVIISFKHKV ILIGGTEYAG
KTVEQP-FTI LSAPHFKADP KTD-GTHSET FIIVSFEKRT
ILIGGTEYAG -PAGKDEWQV LNVANFECVP ERD-GTNSDG
CVILNFAQKK VLIAGMRYAG LPSFQPKLTI IDLPSFKADP
VRH-GCRSET VIACDLTNGL VLIGGTSYAG LASFLPKLTI
IDLPSFKANP ERH-GCRGET IIACDLTKGL VLIGGTSYAG
LGQFVPEMTI IDLPSFRADP ARH-GSRTET VIAVDLTRQI
VLIGGTSYAG LENFVPELTL IDLPSFRADP KRH-GCRSEN
VVAIDFARKI VLIGGTQYAG ----SYDMVT IDVP------
-----SYSDV WMLVERRSNS TLVLGSDYYG
Phosphoenolpyruvate carboxylase kinase (PPCK)
gene in 19 species
PPCK_AERPE
27
  • Phosphoenolpyruvate carboxylase kinase (PPCK)
    gene in 19 species, 720 sites.
  • Standard Neighbor-Joining tree constructed by
    ClustalW
  • Tree will differ with varying tree-building and
    distance-estimation methods how do we know
    which to use?
  • Different methods will provide significantly
    different estimates of branch lengths, especially
    for the long branch.

28
Next time inferring evolutionary history from
DNA amino acid sequence alignments
  • Introduction to phylogenetic approaches
  • Maximum Parsimony
  • Minimum Evolution
  • Maximum Likelihood
  • Introduction to tree-building methods
  • Assessing phylogenetic reconstructions
  • Practical uses of phylogenies
Write a Comment
User Comments (0)
About PowerShow.com