Improving Sequence Alignment For Protein Modelling - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Improving Sequence Alignment For Protein Modelling

Description:

hydrophobicity. distance constraints affecting indels. Improving Alignment ... Hydrophobicity / accessibility. Structurally flexible regions. SSMAs identified ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 29
Provided by: daniell83
Category:

less

Transcript and Presenter's Notes

Title: Improving Sequence Alignment For Protein Modelling


1
Improving Sequence Alignment For Protein Modelling
  • Danielle Talbot
  • Supervisor Dr. Andrew Martin
  • University of Reading
  • E-mail d.talbot_at_rdg.ac.uk

2
Overview
  • Why model?
  • The problems of comparative modelling
  • The importance of correct alignments
  • Analyzing misaligned sequences
  • Scoring models to select the best
  • Current and Future work

3
Why Model Protein Sequences?
  • GenPept gt1.53M protein sequences
  • PDB 22,500 structures
  • Both increasing exponentially
  • Experimental determination slow
  • Some structures cannot be determined
    experimentally
  • Comparative Modelling offers an alternative

4
Identify Parents
Align target and parent(s)
Identify SCRs/SVRs
Build sidechains
Build SVRs ab initio or database
Build SCRs from parent(s)
NO
OK?
Evaluate model
Refine model
Final model
YES
5
Major Problems in CM
  • Correct sequence alignment
  • Modelling SVRs
  • Decided to concentrate on sequence alignment

6
Importance of correct alignment
7
Difficulties with Alignments
  • Low sequence identity
  • Insertions / deletions
  • Number of indels
  • Length of indels

8
Why Sequence Alignment Doesnt Work
  • Needleman-Wunsch finds optimum global alignment
    between two sequences
  • Optimum depends simply on a similarity matrix
  • Does not account for factors in 3D
  • secondary structures
  • charges
  • hydrophobicity
  • distance constraints affecting indels

9
Improving Alignment
  • Only successful scoring will be in 3D
  • Two problems
  • Generating (sensible) alternative alignments
  • Scoring the resulting structures

10
Generating Alternative Alignments
11
Misleading Local Sequence Alignments
  • MLSAs extreme case of misalignment
  • regions where apparently obvious sequence
    alignment not observed in the structure
  • not restricted to protein pairs with particularly
    high or low global sequence similarity
  • Analysed to suggest causes of error

12
Saqi, Russell Sternberg
Cytochrome b Reductase
Pig VDLVIKVYFKDTHP- E.coli
-FDLLVKVYFKNEHP
9 out of 14 identities
13
Finding MLSAs
  • Using CATH H-families, select all pairs of NREPs
    (56,000 pairs)
  • For each pair
  • Perform NW sequence alignment
  • Perform SSAP structural alignment
  • MLSA window of 10aa has ?5aa mis-aligned

14
Finding MLSAs
  • 8 (4,500) of pairs had an MLSA
  • 0.13 (82) pairs had ?6aa misaligned
  • 31 of these had Sseq ? 2Sstruc
  • 9 of these were genuine. Others were
  • errors in CATH domain assignments
  • SSAP errors
  • arbitrary structural alignments
  • (non-globular proteins, highly flexible regions)

15
Genuine MLSAs
  • One because of a hinge region

16
Genuine MLSAs
  • Six were terminal

17
Genuine MLSAs
  • Four minimized exposure of hydrophobics

18
Sequence Structure MisAlignments
  • SSMAs are less extreme examples
  • Simply local regions where sequence and structure
    alignments do not match
  • Have identified these
  • Currently being analysed
  • Analysis will give rules for generating
    alternative alignments

19
Evaluating Alternative Models
20
Evaluating Alternative Models
  • Ultimate evaluation comparison with crystal
    structure
  • Not applicable in real-life modelling!
  • Successful evaluation empirically based
  • Empirical potentials (PROSA-II, RAM, etc.)
  • Rule-based (neural nets, etc.)

21
The RAM Potential
  • Atom-level empirical potential
  • Developed by
  • Ram Samudrala and John Moult

22
Evaluating the RAM Potential
  • Using CATH H-families, select all pairs of NREPs
    (56,000 pairs)
  • For each pair
  • Perform NW sequence alignment
  • Perform SSAP structural alignment
  • Select one as parent and one as target
  • Create model from each alignment with MODELLER
  • Calculate RMSd (model vs. parent) and RAM
    potential

23
Results of Large Scale Analysis
Values for the structurally aligned model
24
How can Models Based on Sequence Alignments be
Better Than Those Based on Structural Alignment?
  • Structural alignment is the gold standard we
    aim to obtain with sequence alignment
  • HOWEVER
  • RMS(seq) lt RMS(struc) in 9.3 of cases

Why?
25
How can Models Based on Sequence Alignments be
Better Than Those Based on Structural Alignment?
  • All cases examined so far have a large indel
  • Parent GSVIQMRLVNYIPLADLPSSVWY
  • Seq ATVLNMR-----------STLWY
  • Struc ATVLNMRS-----------TLWY
  • The DRMSd is usually very small (lt0.5A)

26
Conclusions
  • Causes of MLSAs have been identified
  • Terminal regions
  • Hydrophobicity / accessibility
  • Structurally flexible regions
  • SSMAs identified
  • Performance of RAM potential evaluated

27
Future Work
  • Re-run MLSA analysis with latest dataset
  • Analyse trends seen in SSMAs (Seq-Str
    Mis-Alignments)
  • Create rules for generating alternative
    alignments
  • Use analysis to train a neural net to predict
    correct alignments
  • Compare results with RAM and PROSA-II potentials

28
Acknowledgements
  • Id like to thank
  • Dr. Andrew Martin
  • Everyone in the Bioinformatics Lab. at Reading
  • MRC
Write a Comment
User Comments (0)
About PowerShow.com