Protein Structure Prediction - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Protein Structure Prediction

Description:

Protein Structure Prediction Sequence database searching Domain assignment Multiple sequence alignment Comparative or homology modeling Secondary structure prediction – PowerPoint PPT presentation

Number of Views:263
Avg rating:3.0/5.0
Slides: 74
Provided by: JennyY151
Category:

less

Transcript and Presenter's Notes

Title: Protein Structure Prediction


1
Protein Structure Prediction
  • Sequence database searching
  • Domain assignment
  • Multiple sequence alignment
  • Comparative or homology modeling
  • Secondary structure prediction

2
(No Transcript)
3
(No Transcript)
4
Homologous Proteins
  • The term of homology as used in a biological
    context is defined as similarity of structure,
    physiology, development and evolution of
    organisms based upon common genetic factors.
  • The statement that two proteins are homologous
    implies that their genes have evolved from a
    common ancestral gene. Usually they might have
    similar functions.
  • Two proteins are considered to be homologous when
    they have identical amino acid residues in a
    significant number of sequential positions along
    the polypeptide chains (gt 30 ).
  • Homologous proteins have conserved structural
    cores and variable loop regions.

5
The Divergence of Amino-acid Sequence and 3D
Structure for the Core Region of Homologous
Proteins
  • Known structures of 32 pairs of homologous
    proteins such as globins, serine proteinases, and
    immunoglobulin domains have been compared. The
    root mean square deviation of the main-chain
    atoms of the core regions is plotted as a
    function of amino acid homology. The curve
    represents the best fit of the dots to an
    exponential function. Pairs with high sequence
    homology are almost identical in
    three-dimensional structure, whereas deviations
    in atomic positions for pairs of low homology are
    on the order of 2 Å.

6
A Generalized Approach to Predicting Protein
Structure
  • Relevant experimental data
  • Sequence data/preliminary analysis
  • Sequence Database searching
  • Domain assignment
  • Multiple sequence alignment
  • Comparative or homology modeling
  • Secondary structure prediction
  • Fold Recognition
  • Analysis of folds and alignment of secondary
    structures
  • Sequence to structure alignment

7
Flow Chart
  • This flowchart assumes that the protein is
    soluble, likely comprises a single domain, and
    does not contain non-globular regions.

8
Experimental Data
  • Much experimental data can aid the structure
    prediction process.
  • Some of these are listed below
  • Disulphide bonds, which provide tight restraints
    on the location of cysteines in space
  • Spectroscopic data, which can give ideas as to
    the secondary structure content of the protein
  • Site-directed mutagenesis studies, which can give
    insights as to residues involved in active or
    binding sites
  • Knowledge of proteolytic cleavage sites,
    post-translational modifications, such as
    phosphorylation or glycosylation can suggest
    residues that must be accessible, etc.
  • Remember to keep all of the available data in
    mind when doing predictive work. Always ask
    whether a prediction agrees with the results of
    experiments. If not, then it may be necessary to
    modify what has been completed.

9
Protein Sequence Data
  • There is some value in doing some initial
    analysis on the protein sequence. If a protein
    has come (for example) directly from a gene
    prediction, it may consist of multiple domains.
    More seriously, it may contain regions that are
    unlikely to be globular, or soluble.
  • Is the protein a transmembrane protein, or does
    it contain transmembrane segments? There are many
    methods for predicting these segments, including
  • TMAP (EMBL) http//www.mbb.ki.se/tmap/ind
    ex.html
  • PredictProtein (EMBL/Columbia)
    http//dodo.cpmc.columbia.edu/predictprotein/
  • TMHMM (CBS, Denmark)
  • TMpred (Baylor College)
  • DAS (Stockholm)

10
http//www.mbb.ki.se/tmap/index.html
11
COILS - Prediction of Coiled Coil Regions in
Proteins
  • Does the protein contain coiled-coils?
    Prediction of coiled coils can be completed at
    the COILS server or by downloading the COILS
    program. http//www.ch.embnet.org/software/COILS_f
    orm.html
  • COILS is a program that compares a sequence to a
    database of known parallel two-stranded
    coiled-coils and derives a similarity score. By
    comparing this score to the
  • distribution of scores in globular and
    coiled-coil proteins, the program then calculates
    the probability that the sequence will adopt a
    coiled-coil conformation.
  • COILS was described in
  • Lupas, A., Van Dyke, M., and Stock, J. (1991)
    Predicting Coiled Coils from Protein Sequences,
    Science 2521162-1164.

12
(No Transcript)
13
Does the Protein Contain Regions of Low
Complexity?
  • Proteins frequently contain runs of
    poly-glutamine or poly-serine, which do not
    predict well. To check for this the program SEG
    (a version of SEG is also contained within the
    GCG suite of programs) can be employed.
    ftp//ftp.ncbi.nlm.nih.gov/pub/seg/seg/
  • If the answer to any of the above questions is
    yes, then it is worthwhile trying to break the
    sequence into pieces or ignore particular
    sections of the sequence, etc. This is related
    to the problem of locating domains.

14
Multiple Sequence Alignment
  • Alignments can provide
  • Information to protein domain structure
  • The location of residues likely to be involved in
    protein function
  • Information of residues likely to be buried in
    the protein core or exposed to solvent
  • More information on a single sequence for
    applications like homology modeling and
    secondary structure prediction.

15
(No Transcript)
16
Sequence Database Searching
  • The most obvious first stage in the analysis of
    any new sequence is to perform comparisons with
    sequence databases to find homologues. These
    searches can now be performed just about anywhere
    and on just about any computer. In addition,
    there are numerous web servers for doing
    searches, where one can post or paste a sequence
    into the server and receive the results
    interactively.

17
Sequence Database Searching
  • There are many methods for sequence searching.
    By far the most well known are the BLAST suite of
    programs. One can easily obtain versions to run
    locally (either at NCBI or Washington
    University), and there are many web pages that
    permit one to compare a protein or DNA sequence
    against a multitude of gene and protein sequence
    databases. To name just a few
  • National Center for Biotechnology Information
    (USA) Searches
  • http//www.ncbi.nlm.nih.gov/BLAST/
  • European Bioinformatics Institute (UK) Searches
  • http//www2.ebi.ac.uk/
  • BLAST search through SBASE (domain database
    ICGEB, Trieste)

18
BLAST
  • One of the most important advances in sequence
    comparison recently has been the development of
    both gapped BLAST and PSI-BLAST (position
    specific interated BLAST).
  • Both of these have made BLAST much more
    sensitive, and the latter is able to detect very
    remote homologues by taking the results of one
    search, constructing a profile and then using
    this to search the database again to find other
    homologues (the process can be repeated until no
    new sequences are found).
  • It is essential that one compares any new protein
    sequence to the database with PSI-BLAST to see if
    known structures can be found prior to doing any
    of the other methods discussed in the next
    sections.

19
(No Transcript)
20
Sequence Database Searching
  • Other methods for comparing a single sequence to
    a
  • database include
  • The FASTA suite (William Pearson, University of
    Virginia, USA)
  • http//alpha10.bioch.virginia.edu/fasta/
  • SCANPS (Geoff Barton, European Bioinformatics
    Institute, UK)
  • http//barton.ebi.ac.uk/new/software.html
  • BLITZ (Compugen's fast Smith Waterman search)
  • http//www2.ebi.ac.uk/bic_sw/

21
Multiple Sequence Database Searching
  • It is also possible to use multiple sequence
    information to perform more sensitive searches.
    Essentially this involves building a profile from
    some kind of multiple sequence alignment. A
    profile essentially gives a score for each type
    of amino acid at each position in the sequence,
    and generally makes searches more sensitive.
  • Tools for doing this include
  • PSI-BLAST (NCBI, Washington)
  • ProfileScan Server (ISREC, Geneva)
  • http//www.isrec.isb-sib.ch/software/PFSCAN_form.h
    tml
  • HMMER Hidden Markov Model searching (Sean Eddy,
    Washington University)
  • http//hmmer.wustl.edu/
  • Wise package (Ewan Birney, Sanger Centre this is
    for protein versus DNA comparisons) and several
    others.
  • http//www.sanger.ac.uk/Software/Wise2/

22
Multiple Sequence Searching Using a Motif
  • A different approach for incorporating multiple
    sequence information into a database search is to
    use a MOTIF. Instead of giving every amino acid
    some kind of score at every position in an
    alignment, a motif ignores all but the most
    invariant positions in an alignment, and just
    describes the key residues that are conserved and
    define the family. Sometimes this is called a
    "signature".
  • For example, "H-FW-x-LIVM-x-G-x(5)-LV-H-x(3)
    -DE" describes a family of DNA binding
    proteins. It can be translated as "histidine,
    followed by either phenylalanine or tryptophan,
    followed by any amino acid (x), followed by
    leucine, isoleucine, valine or methionine,
    followed by any amino acid (x), followed by
    glycine, . . . etc.".

23
Multiple Sequence Searching Using a Motif
  • PROSITE (ExPASy Geneva) contains a huge number of
    such patterns, and several sites allow you to
    search these data
  • ExPASy http//www.expasy.ch/tools/scnpsite.htm
    l
  • EBI http//www2.ebi.ac.uk/ppsearch/
  • It is best to search a few different databases in
    order to find as many homologues as possible. A
    very important thing to do, and one which is
    sometimes overlooked, is to compare any new
    sequence to a database of sequences for which 3D
    structure information is available. Whether or
    not the sequence is homologous to a protein of
    known 3D structure is not obvious in the output
    from many searches of large sequence databases.
    Moreover, if the homology is weak, the similarity
    may not be apparent at all during the search
    through a larger database.
  • One can save a lot of time by making use of
    pre-prepared protein alignment.

24
Web sites for Performing Multiple Alignment
  • EBI (UK) Clustalw Server
  • http//www2.ebi.ac.uk/clustalw/
  • IBCP (France) Multalin Server
  • http//www.ibcp.fr/multalin.html
  • IBCP (France) Clustalw Server
  • IBCP (France) Combined Multalin/Clustalw
  • MSA (USA) Server
  • http//www.ibc.wustl.edu/ibc/msa.html
  • BCM Multiple Sequence Alignment ClustalW Sever
  • http//dot.imgen.bcm.tmc.edu9331/multi-align/Opti
    ons/clustalw.html

25
Some Tips for Sequence Alignment
  • Don't just take everything found in the searches
    and feed them directly into the alignment
    program. Searches will almost always return
    matches that do not indicate a significant
    sequence similarity. Look through the output
    carefully and throw things out if they don't
    appear to be a member of the sequence family.
    Inclusion of non-members in the alignment will
    confuse things and likely lead to errors later.
  • Remember that the programs for aligning sequences
    aren't perfect, and do not always provide the
    best alignment. This is particularly so for
    large families of proteins with low sequence
    identities. If a better way of aligning the
    sequences is discovered, then by all means edit
    the alignment manually.

26
Locating Domains
  • If the sequence has more than about 500 amino
    acids, it is almost certain that it will be
    divided into discrete functional domains. If
    possible, it is preferable to split such large
    proteins up and consider each domain separately.
    One can predict the location of domains in a few
    different ways. The methods below are given
    (approximately) from the most to the least
    confident.
  • If homology to other sequences occurs only over a
    portion of the probe sequence and the other
    sequences are whole (i.e. not partial sequences),
    then this provides the strongest evidence for
    domain structure. Either complete database
    searches or make use of pre-defined databases of
    protein domains. Searches of these databases
    (see links below) will often assign domains
    easily.

27
Locating domains
  • Regions of low-complexity often separate domains
    in multi-domain proteins. Long stretches of
    repeated residues, particularly Proline,
    Glutamine, Serine or Threonine often indicate
    linker sequences and are usually a good place to
    split proteins into domains.
  • Low complexity regions can be defined using the
    program SEG which is generally available in most
    BLAST distributions or web servers.
  • Transmembrane segments are also very good
    dividing points, since they can easily separate
    extracellular from intracellular domains.

28
Locating Domains
  • Something else to consider are the presence of
    coiled-coils. These unusual structural features
    sometimes (but not always) indicate where
    proteins can be divided into domains.
  • Secondary structure prediction methods will often
    predict regions of proteins to have different
    protein structural classes. For example, one
    region of a sequence may be predicted to contain
    only a helices and another to contain only b
    sheets. These can often, though not always,
    suggest likely domain structure.
  • If a sequence has been separated into domains,
    then it is very important to repeat all the
    database searches and alignments using the
    domains separately. Searches with sequences
    containing several domains may not find all
    sub-homologies, particularly if the domains are
    abundant in the database (e.g. kinases, SH2
    domains, etc.).

29
Domain Assignment
30
Locating Domains by Web Sites
  • SMART (Oxford/EMBL)
  • http//smart.embl-heidelberg.de/
  • PFAM (Sanger Center/Wash-U/Karolinska Intitutet)
  • http//www.sanger.ac.uk/Software/Pfam/search.shtml
  • COGS (NCBI)
  • PRINTS (UCL/Manchester)
  • BLOCKS (Fred Hutchinson Cancer Research Center,
    Seattle)
  • http//blocks.fhcrc.org/blocks/blocks_search.html
  • SBASE (ICGEB, Trieste)
  • Domain descriptions can also be located in the
    annotations in SWISSPROT.

31
(No Transcript)
32
P68 RNA Helicase
  • ssyssdrdr grdrgfgapr fggsrtgpls gkkfgnpgek
    lvkkkwnlde lpkfeknfyq ehpdlarrta qevdtyrrsk
    eitvrghncp kpvlnfyean fpanvmdvia rhnfteptai
  • qaqgwpvals gldmvgvaqt gsgktlsyll paivhinhhp
    flergdgpic lvlaptrela qqvqqvaaey cracrlkstc
    iyggapkgpq irdlergvei ciatpgrlid flecgktnlr
    rttylvldea drmldmgfep qirkivdqir pdrqtlmwsa
    twpkevrqla edflkdyihi nigalelsan hnilqivdvc
    hdvekdekli rlmeeimsek enktivfvet krrcdeltrk
    mrrdgwpamg ihgdksqqer dwvlnefkhg kapiliatdv
    asrgldvedv kfvinydypn ssedyihrig rtarstktgt
    aytfftpnni kqvsdlisvl reanqainpk llqlvedrgs
  • grsrgrggmk ddrrdrysag krggfntfrd renydrgysn
    llkrdfgakt qngvysaany tngsfgsnfv sagiqtsfrt
    gnptgtyqng ydstqqygsn vanmhngmnq qayaypvpqp
  • apmigypmpt gysq 614 aa
  • f015812 (Genebank)

33
(No Transcript)
34
Sequence Alignment of p68 to DEAD Proteins
Walker A
AXTGSGKT Walker A motif for ATP binding DEAD ATP
binding, ATP hydrolysis SAT Transmission energy
from ATP to unwind RNA
35
P68 RNA Helicase
36
Comparative or Homology Modeling
  • If the protein sequence shows significant
    homology to another protein of known
    three-dimensional structure, then a fairly
    accurate model of the protein 3D structure can be
    obtained via homology modeling.
  • It is also possible to build models if one has
    found a suitable fold via fold recognition and is
    satisfied with the alignment of sequence to
    structure (Note that the accuracy of models
    constructed in this manner has not been assessed
    properly, so treat with caution).

37
Comparative or Homology Modeling
  • It is possible now to generate models
    automatically using the very useful SWISSMODEL
    server. It is possible to send in a protein
    sequence only when the degree of sequence
    homology is high (50 or greater). It is best,
    particularly if one has edited an alignment, to
    send an alignment directly to the server.
  • http//www.expasy.ch/swissmod/SWISS-MODEL.html
  • Some other sites useful for homology modeling
    include
  • WHAT IF (G. Vriend, EMBL, Heidelberg)
  • http//www.cmbi.kun.nl/whatif/
  • MODELLER (A. Sali, Rockefeller University)
  • http//guitar.rockefeller.edu/modeller/modeller.ht
    ml
  • MODELLER Mirror FTP site

38
(No Transcript)
39
Swiss-Model of P68 Based on EIF-4A
DEAD
SAT
Walker A AQSGTGKT
  • EIF-4A is the initiation factor (1QAV) with 1.8 Å
    resolution.

40
(No Transcript)
41
Methods for Single Sequences
  • Secondary structure prediction has been around
    for almost a quarter of a century. The early
    methods suffered from a lack of data.
    Predictions were performed on single sequences
    rather than families of homologous sequences, and
    there were relatively few known 3D structures
    from which to derive parameters. Probably the
    most famous early methods are those of Chou
    Fasman, Garnier, Osguthorbe Robson (GOR) and
    Lim.
  • Although the authors originally claimed quite
    high accuracies (70 - 80 ), under careful
    examination, the methods were shown to be only
    between 56 and 60 accurate (Kabsch Sander,
    1984). An early problem in secondary structure
    prediction had been the inclusion of structures
    used to derive parameters in the set of
    structures used to assess the accuracy of the
    method.

42
Methods for Single Sequences
  • Early methods on single sequences
  • Chou, P.Y. Fasman, G.D. (1974). Biochemistry,
    13, 211-222.
  • Lim, V.I. (1974). Journal of Molecular Biology,
    88, 857-872.
  • Garnier, J., Osguthorpe, D.J. \ Robson, B.
    (1978).Journal of Molecular Biology, 120, 97-120.
  • Kabsch, W. Sander, C. (1983). FEBS Letters,
    155, 179-182. (An assessment of the above
    methods)
  • Later methods on single sequences
  • Deleage, G. Roux, B. (1987). Protein
    Engineering , 1, 289-294 (DPM)
  • Presnell, S.R., Cohen, B.I. Cohen, F.E. (1992).
    Biochemistry, 31, 983-993.
  • Holley, H.L. Karplus, M. (1989). Proceedings of
    the National Academy of Science, 86, 152-156.
  • King, R. Sternberg, M. J.E. (1990). Journal of
    Molecular Biology, 216, 441-457.
  • D. G. Kneller, F. E. Cohen R. Langridge (1990)
    Improvements in Protein Secondary Structure
    Prediction by an
  • Enhanced Neural Network, Journal of Molecular
    Biology, 214, 171-182. (NNPRED)

43
(No Transcript)
44
Assignment of Amino Acids
45
Frequency of Occurrence of Amino Acids in the b
Turns
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Secondary Structure Prediction Methods Links
  • There are now many web servers for structure
    prediction, here is a quick summary
  • PSI-pred (PSI-BLAST profiles used for prediction
    David Jones, Warwick)
  • JPRED Consensus prediction (Cuff Barton, EBI)
  • http//barton.ebi.ac.uk/servers/jpred.html
  • PREDATORFrischman Argos (EMBL)
  • http//www.embl-heidelberg.de/cgi/predator_serv.pl
  • PHD home page Rost Sander, EMBL, Germany
  • http//www.embl-heidelberg.de/predictprotein/predi
    ctprotein.html
  • ZPRED server Zvelebil et al., Ludwig, U.K.
  • http//kestrel.ludwig.ucl.ac.uk/zpred.html (GOR)
  • nnPredict Cohen et al., UCSF, USA.
  • http//www.cmpharm.ucsf.edu/nomi/nnpredict.html
  • BMERC PSA Server Boston University, USA
  • http//bmerc-www.bu.edu/psa/
  • SSP (Nearest-neighbor) Solovyev and Salamov,
    Baylor College, USA.
  • http//dot.imgen.bcm.tmc.edu9331/pssprediction/ps
    sp.html

50
Recent Improvements
  • The availability of large families of homologous
    sequences revolutionized secondary structure
    prediction.
  • Traditional methods, when applied to a family of
    proteins rather than a single sequence, proved
    much more accurate at identifying core secondary
    structure elements. The combination of sequence
    data with sophisticated computing techniques such
    as neural networks has lead to accuracies well in
    excess of 70 . Though this seems a small
    percentage increase, these predictions are
    actually much more useful than those for single
    sequence, since they tend to predict the core
    accurately.
  • Moreover, the limit of 70 80 may be a
    function of secondary structure variation within
    homologous proteins.

51
(No Transcript)
52
Automated Methods
  • There are numerous automated methods for
    predicting secondary structure from multiply
    aligned protein sequences. Some good references
    are
  • Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R.
    Sternberg, M.J.E. (1987). Prediction of Protein
    Secondary Structure and Active Sites Using the
    Alignment of Homologous Sequences Journal of
    Molecular Biology, 195, 957-961. (ZPRED)
  • Rost, B. Sander, C. (1993), Prediction of
    protein secondary structure at better than 70
    Accuracy, Journal of Molecular Biology, 232,
    584-599. PHD)
  • Salamov A.A. Solovyev V.V. (1995), Prediction
    of protein secondary sturcture by combining
    nearest-neighbor algorithms and multiply sequence
    alignments. Journal of Molecular Biology, 247,1
    (NNSSP)
  • Geourjon, C. Deleage, G. (1994), SOPM a self
    optimised prediction method for protein secondary
    structure prediction. Protein Engineering, 7,
    157-16. (SOPMA)
  • Solovyev V.V. Salamov A.A. (1994) Predicting
    alpha-helix and beta-strand segments of globular
    proteins. (1994) Computer Applications in the
    Biosciences,10,661-669. (SSP)
  • Wako, H. Blundell, T. L. (1994), Use of
    amino-acid environment-depdendent substitution
    tables and conformational propensities in
    structure prediction from aligned sequences of
    homologous proteins. 2. Secondary Structures,
    Journal of Molecular Biology, 238, 693-708.
  • Mehta, P., Heringa, J. Argos, P. (1995), A
    simple and fast approach to prediction of protein
    secondary structure from multiple aligned
    sequences with accuracy above 70 . Protein
    Science, 4, 2517-2525. (SSPRED)
  • King, R.D. Sternberg, M.J.E. (1996)
    Identification and application of the concepts
    important for accurate and reliable protein
    secondary structure prediction. Protein Sci,5,
    2298-2310. (DSC).

53
(No Transcript)
54
PHD Prediction of rCD2
55
Comparison Between Prediction X-ray
56
Manual Intervention
  • It has long been recognized that patterns of
    residue conservation are indicative of particular
    secondary structure types.
  • Alpha helices have a periodicity of 3.6, which
    means that for helices with one face buried in
    the protein core, and the other exposed to
    solvent, the residues at positions i, i3, i4
    i7 (where i is a residue in an ? helix) will lie
    on one face of the helix. Many alpha helices in
    proteins are amphipathic, meaning that one face
    is pointing towards the hydrophobic core and the
    other towards the solvent. Thus patterns of
    hydrophobic residue conservation showing the i,
    i3, i4, i7 pattern are highly indicative of an
    alpha helix.

57
Pattern in Amphipathic Helix
  • For example, this helix in myoglobin has a
    classic pattern of hydrophobic and polar residue
    conservation (i 1).

58
Pattern in Amphipathic Beta Strand
  • The geometry of beta strands means that adjacent
    residues have their side chains pointing in
    opposite directions.
  • Beta strands that are half buried in the protein
    core will tend to have hydrophobic residues at
    positions i, i2, i4, i8, etc, and polar
    residues at positions i1, i3, i5, etc.

59
Pattern in Buried Beta Strand
  • Beta strands that are completely buried (as is
    often the case in proteins containing both alpha
    helices and beta strands) usually contain a run
    of hydrophobic residues, since both faces are
    buried in the protein core.

60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
Secondary Structure Prediction of CD2
66
CD2 vs. Helical Propensity
  • Residues on strands C, C, C and G have strong
    helical propensity

67
  • Three automated secondary structure predictions
    (PHD, SOPMA and
  • SSPRED) appear below the alignment of 12 glutamyl
    tRNA reductase
  • sequences. Positions within the alignment
    showing a conservation of
  • hydrophobic side-chain character are shown in
    yellow, and those
  • showing near total conservation of
    non-hydrophobic residues (often
  • indicative of active sites) are colored green.

68
  • Predictions of accessibility performed by PHD
    (PHD Acc. Pred.) are also shown (b buried, e
    exposed).
  • For example, positions (within the alignment) 38
    - 45 exhibit the classical amphipathic helix
    pattern of hydrophobic residue conservation, with
    positions i, i3, i4 and i7 showing a
    conservation of hydrophobicity, with intervening
    positions being mostly polar.
  • Positions 13 - 16 comprise a short stretch of
    conserved hydrophobic residues, indicative of a
    buried beta-strand.

69
Alignment of Sequence to Tertiary Structure
  • Remember that the alignments of sequence for
    tertiary structure that one gets from fold
    recognition methods may be inaccurate. In
    instances where one has identified a remote
    homologue, then the fold recognition methods can
    sometimes give a very accurate alignment, though
    it is still sometimes fruitful to edit the
    alignment around variable regions.
  • In other cases, it may be wise to create an
    alignment by starting with the alignment from the
    fold recognition method, and considering the
    alignment of secondary structures.

70
Alignment of Sequence to Tertiary Structure
  • There is one suggested method by Dr. Robert B.
    Russell
  • Ensure that residues predicted to be
    buried/exposed align to those known to be buried
    or exposed in the template structure. Note that
    conserved hydrophobic/polar residues are more
    likely to be buried/exposed than non-conserved
    residues, which could simply be anomalies. One
    can predict residue accessibility manually, or by
    use of an automated server like PHD.
  • Ensure that critical hydrogen bonding patterns
    are not disrupted in beta-sheet structures.
  • Attempt to conserve residue properties (i.e.
    size, polarity, hydrophobicity) as best as
    possible across known and unknown structure.

71
Things Need to be Considered
  • In the construction of an alignment, several
    things need be
  • considered
  • The observed residue burial or exposure
  • The predicted residue burial or exposure
  • The conservation of residue properties in
    known and unknown structures
  • Whether or not the side chains on the core
    beta-strands pointed in towards the barrel or
    out towards the helices
  • The hydrogen bonding pattern of the
    beta-strands comprising the core beta-barrel.

72
Alignment of the Prediction of the Glutamyl tRNA
Reductases (hemA) with an Alpha/beta Barrel
Structure (2acs)
73
Alignment of the Prediction of the Glutamyl tRNA
Reductases (hemA) with an Alpha/beta Barrel
Structure (2acs)
  • Sec. known secondary structure from PDB code
    2ACS (E extended, H alpha helix, G 310
    helix, B beta-bridge)
  • Bur. known residue exposure for 2ACS (b
    buried, h half-buried, e exposed) in/out
    positioning of residues in the beta-barrel (i
    pointing inwards, o pointing outwards)
  • Res. cons conservation of residues (totally
    conserved UPPER CASE, h hydrophobic, p
    polar, c charged, a aromatic, s small, -
    negative, positive) Pred denotes predicted
    burial and secondary structure for the glutamyl
    tRNA reductase family
  • Boxed positions are those with the same
    known/predicted burial. Shaded positions show a
    conservation of hydrophobic character in BOTH
    families of proteins, and positions in inverse
    text show a conservation of polar character in
    BOTH families.
Write a Comment
User Comments (0)
About PowerShow.com