Tools for Protein Informatics - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Tools for Protein Informatics

Description:

Screening for membership in a family or superfamily. ... Pfam Prints SCOP. ProDom COG MSD. CD Search PDB. Blocks Systers. Homstrad Pandit ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 31
Provided by: jesses6
Category:

less

Transcript and Presenter's Notes

Title: Tools for Protein Informatics


1
Tools for Protein Informatics
  • Proteomics qualitative and quantitative
    comparison of a proteome (complete set of
    proteins of an organism) under different
    conditions to unravel biological processes.
  • Sequence and structure comparison
  • Multiple alignments / phylogenetic trees
  • composition/ pI/ mass analysis
  • Motif and pattern identification
  • Secondary structure prediction, TM prediction,
    etc.
  • Threading / Homology Modelling
  • Visualization and interpretation

2
Using Multiple Alignments
  • Screening for membership in a family or
    superfamily.
  • Identification of conserved elements important to
    or defining function.
  • Identification of highly variable regions.
  • Distinguishing global vs. local patterns of
    similarity characteristic of the structural
    scaffold.
  • Create a consensus sequence or profile.
  • Determination of the level and sites of
    variability across the members of subgroups /
    families / superfamilies.
  • Can confirm distant relationships. If A is
    homologous to B andB is homologous to C, A is
    homologous to C.

3
Multiple Alignment
  • Definition
  • All multiple alignments are pairwise alignments.
  • More.

4
Progressive Alignment Methods
  • Feng and Doolittle (1987)
  • Compare all sequences pairwise.
  • There are N x (N-1) / 2 pairs for N sequences.
  • Perform a cluster analysis on the pairwise
    data to form a
  • guide tree.
  • Build the multiple alignment by first aligning
    the most
  • similar pair of sequences, then so on.
  • Compare sequence to sequence, sequence to
    alignment,
  • or alignment to alignment.

5
Progressive Alignment Methods II
  • 1. Alignment to sequence comparisons are
    accomplished by pairwise comparisons.
  • Then adjust the position of indels in ALL
    sequences.
  • 2. Sequences are aligned in order of decreasing
    similarity.
  • Multiple algorithms for construction of guide
    tree.
  • (i.e. neighbor joining clustering method)
  • Sequences are weighted according to their
  • relatedness from the guide tree.

6
Progressive Alignment Methods III
  • The substitution matrix can be chosen on the fly
  • according to the similarity of the guide
    tree.
  • 6. Once a group has been aligned, one should
    use the position specific information from the
    multiple alignment to align the next sequence.
  • 7. The amount of sequence conservation and
    gapping at any position along the alignment is
    used to help determine mismatch penalties and gap
    opening and extension penalties. Varies
    parameters at different positions, different
    cycles.

7
Gap Penalties
  • Lower the gap penalties at positions where gaps
    already occur
  • Increase gap penalties adjacent to positions
    where gaps already occur.
  • Reduce gap penalties where stretches of
    hydrophilic residues occur.
  • Increase or decrease gap penalties using tables
    of the observed frequencies of gaps adjacent to
    each of the 20 amino acids.

8
Profiles
  • Information in a multiple alignment is
    represented quantitatively as a table of position
    -specific symbol comparison values and gap
    penalties.
  • HHM - Hidden Markov Models -
  • probability based models for description of
    alignment.

9
Cautions
  • Information OUT directly depends on information
    IN.
  • Overrepresentation of a subset of sequences to be
    aligned may bias the inference of an ordered
    series of motifs.
  • Multiple alignment is a global alignment.
    Extraneous non-homologous sequence will interrupt
    the alignment of homologous regions.

10
Precomputed Multiple Alignments of Protein
Families
  • Pfam http//pfam.wustl.edu/
  • HMMs
  • Prodom http//protein.toulouse.inra.fr/prodom
    .html
  • PSI-Blast
  • Systers http//www.dkfz-heidelberg.de/tbi/servic
    es/documentation/systershelp.html
  • MetaFam http//metafam.ahc.umn.edu/
  • Functional assignments

11
Examples
  • ProDom
  • Family PD000061 
  • WD-REPEAT CONTAINING FACTOR COMPLETE PROTEOME
    TRP-ASP SUBUNIT INITIATION
  • http//prodes.toulouse.inra.fr/prodom/2002.1/cgi-b
    in/request.pl?questionDBENqueryPD000061
  • Pfam
  • Family PF00400 
  • WD domain, G-beta repeat
  • http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF0040
    0

12
Secondary Information from Precomputed Alignments
  • Related sequences, related structures, related
    articles, summaries, etc.
  • InterPro ProSite CATH
  • Pfam Prints SCOP
  • ProDom COG MSD
  • CD Search PDB
  • Blocks Systers
  • Homstrad Pandit

13
Applications for Motif Analysis
  • Identification of very distant homologs.
  • May point to important functional units in a
    protein.
  • Can be used to anchor or break-up a multiple
  • alignment.
  • Database of motifs can be used to develop
    other
  • informatics applications.

14
ProSite Protein Family Signatures
  • http//tw.expasy.org/prosite/
  • Signatures include Motifs
  • Documentation
  • Consensus pattern
  • LIVMSTAGC-x(2)-DN- x(2)-LIVMWSTAC-x-LIVMFST
    AG-W-DEN-LIVMFSTAGCN
  • Sequences known to belong to the class
  • References

15
Meme and Mast
  • http//meme.sdsc.edu/meme/website
  • Meme motif discovery tool, motif search tool.
  • Motifs (highly conserved region) represented as a
    position-dependent letter probability matrices
    which describe the probability of each possible
    letter at each position in the pattern.
  • PSSM position-specific scoring matrix.

16
(No Transcript)
17
Protein Sequence Predictions
  • Secondary structure
  • (alpha helix, beta sheet, turns and coils)
  • Transmembrane regions
  • Coiled-coil
  • Solvent accessible, buried
  • Antigenicity
  • More

18
Chou Fasman
  • Biochemistry (1974) 132, 222.
  • Predicts helix, beta, reverse turn, or none.
  • Accuracy 77 .
  • Based on short-range and medium-range residue
    interactions.
  • Training set used to determine secondary
    structure potential of the amino acids.

19
Chou Fasman Amino acid Probabilities
20
Chou Fasman
  • Method
  • Assign each residue secondary structure
    potential
  • (helix, beta, etc.).
  • Locate clusters of helix formers, helix
    breakers, etc.
  • using a sliding window.
  • Search for helical regions.
  • Search for beta region.
  • Search for turns.

21
GOR
  • J. Mol. Bio. (1978) 120, 97.
  • J. Mol. Bio. (1987) 195, 957.
  • Predicts helix, beta, reverse turn, or coil.
  • Accuracy 60 to 70.
  • Based on single residue determination rather than
    residue interaction.
  • Optimized toward expected percentage secondary
    structure.

22
GOR
23
GOR
24
GOR
  • METHOD
  • Evaluate the information state for each residue,
    for each conformation state
  • Locate the conformation with the highest content
  • Add information content from homologous sequence,
    then divide by the number of sequences. Weights
    conserved secondary structural elements.

25
(No Transcript)
26
Accuracy of Predictions
  • Information OUT directly depends on information
    IN.
  • Inherent errors
  • different training sets
  • homologous proteins in training set
  • different definitions of sceondary structure
  • measure of accuracy

27
Recommendations
  • Use a number of DIFFERENT programs.
  • (statistical versus neural network)
  • Use homologous sequences.
  • (variable hmologs 90, 80, 70, 60, )
  • allow for prediction smoothing.
  • (beware end effects)
  • Use common sense.

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com