Improving Sequence Alignment For Protein Modelling - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Improving Sequence Alignment For Protein Modelling

Description:

hydrophobicity. distance constraints affecting indels. Improving Alignment ... Hydrophobicity / accessibility. Structurally flexible regions. SSMAs identified ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 29

Provided by: daniell83

Category:

more less

Transcript and Presenter's Notes

Title: Improving Sequence Alignment For Protein Modelling

1
Improving Sequence Alignment For Protein Modelling

Danielle Talbot
Supervisor Dr. Andrew Martin
University of Reading
E-mail d.talbot_at_rdg.ac.uk

2
Overview

Why model?
The problems of comparative modelling
The importance of correct alignments
Analyzing misaligned sequences
Scoring models to select the best
Current and Future work

3
Why Model Protein Sequences?

GenPept gt1.53M protein sequences
PDB 22,500 structures
Both increasing exponentially
Experimental determination slow
Some structures cannot be determined
experimentally
Comparative Modelling offers an alternative

4
Identify Parents
Align target and parent(s)
Identify SCRs/SVRs
Build sidechains
Build SVRs ab initio or database
Build SCRs from parent(s)
NO
OK?
Evaluate model
Refine model
Final model
YES
5
Major Problems in CM

Correct sequence alignment
Modelling SVRs
Decided to concentrate on sequence alignment

6
Importance of correct alignment
7
Difficulties with Alignments

Low sequence identity
Insertions / deletions
Number of indels
Length of indels

8
Why Sequence Alignment Doesnt Work

Needleman-Wunsch finds optimum global alignment
between two sequences
Optimum depends simply on a similarity matrix
Does not account for factors in 3D
secondary structures
charges
hydrophobicity
distance constraints affecting indels

9
Improving Alignment

Only successful scoring will be in 3D
Two problems
Generating (sensible) alternative alignments
Scoring the resulting structures

10
Generating Alternative Alignments
11
Misleading Local Sequence Alignments

MLSAs extreme case of misalignment
regions where apparently obvious sequence
alignment not observed in the structure
not restricted to protein pairs with particularly
high or low global sequence similarity
Analysed to suggest causes of error

12
Saqi, Russell Sternberg
Cytochrome b Reductase
Pig VDLVIKVYFKDTHP- E.coli
-FDLLVKVYFKNEHP
9 out of 14 identities
13
Finding MLSAs

Using CATH H-families, select all pairs of NREPs
(56,000 pairs)
For each pair
Perform NW sequence alignment
Perform SSAP structural alignment
MLSA window of 10aa has ?5aa mis-aligned

14
Finding MLSAs

8 (4,500) of pairs had an MLSA
0.13 (82) pairs had ?6aa misaligned
31 of these had Sseq ? 2Sstruc
9 of these were genuine. Others were
errors in CATH domain assignments
SSAP errors
arbitrary structural alignments
(non-globular proteins, highly flexible regions)

15
Genuine MLSAs

One because of a hinge region

16
Genuine MLSAs

Six were terminal

17
Genuine MLSAs

Four minimized exposure of hydrophobics

18
Sequence Structure MisAlignments

SSMAs are less extreme examples
Simply local regions where sequence and structure
alignments do not match
Have identified these
Currently being analysed
Analysis will give rules for generating
alternative alignments

19
Evaluating Alternative Models
20
Evaluating Alternative Models

Ultimate evaluation comparison with crystal
structure
Not applicable in real-life modelling!
Successful evaluation empirically based
Empirical potentials (PROSA-II, RAM, etc.)
Rule-based (neural nets, etc.)

21
The RAM Potential

Atom-level empirical potential
Developed by
Ram Samudrala and John Moult

22
Evaluating the RAM Potential

Using CATH H-families, select all pairs of NREPs
(56,000 pairs)
For each pair
Perform NW sequence alignment
Perform SSAP structural alignment
Select one as parent and one as target
Create model from each alignment with MODELLER
Calculate RMSd (model vs. parent) and RAM
potential

23
Results of Large Scale Analysis
Values for the structurally aligned model
24
How can Models Based on Sequence Alignments be
Better Than Those Based on Structural Alignment?