Protein Feature Identification - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Protein Feature Identification

Description:

Molecular Weight. Amino Acid Frequency. Isoelectric Point. UV ... Molecular Weight & Proteomics. 2-D Gel QTOF Mass Spectrometry. Amino Acid Frequency ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 57
Provided by: Comp632
Category:

less

Transcript and Presenter's Notes

Title: Protein Feature Identification


1
Protein Feature Identification
  • Microbiology 343
  • David Wishart
  • david.wishart_at_ualberta.ca

2
Objectives
  • To show that almost everything you do in the lab
    or what you need to do to work with a protein can
    be done on a computer
  • Learning methods and algorithms for predicting
    composition and sequence features
  • Learning when to use these tools

3
Proteins
  • Exhibit far more sequence and chemical
    complexity than DNA or RNA
  • Properties and structure are defined by the
    sequence and side chains of their constituent
    amino acids
  • The engines of life
  • gt95 of all drugs targets are proteins
  • Favorite topic of post-genomic era

4
The Post-genomic Challenge
  • How to rapidly identify a protein?
  • How to rapidly purify a protein?
  • How to identify post-trans modification?
  • How to find information about function?
  • How to find information about activity?
  • How to find information about location?
  • How to find information about structure?

Answer Look at Protein Features
5
Protein Features
ACEDFHIKNMF SDQWWIPANMC ASDFDPQWERE LIQNMDKQERT QA
TRPQDS...

Sequence View Structure View
6
Different Types of Features
  • Composition Features
  • Mass, pI, Absorptivity, Rg, Volume
  • Sequence Features
  • Active sites, Binding Sites, Targeting, Location,
    Property Profiles, 2o structure
  • Structure Features
  • Supersecondary Structure, Global Fold, ASA,
    Volume

7
Where To Go
http//www.expasy.org/tools/
8
Compositional Features
  • Molecular Weight
  • Amino Acid Frequency
  • Isoelectric Point
  • UV Absorptivity
  • Solubility, Size, Shape
  • Radius of Gyration
  • Free Energy of Folding

9
Molecular Weight
10
Molecular Weight
  • Useful for SDS PAGE and 2D gel analysis
  • Useful for deciding on SEC matrix
  • Useful for deciding on MWC for dialysis
  • Essential in synthetic peptide analysis
  • Essential in peptide sequencing (classical or
    mass-spectrometry based)
  • Essential in proteomics and high throughput
    protein characterization

11
Molecular Weight
  • Crude MW calculation MW 110 X Numres
  • Exact MW calculation MW SAAi x MWi
  • Remember to add 1 water (18.01 amu) after adding
    all res.
  • Note isotopic weights
  • Corrections for CHO, PO4, Acetyl, CONH2

12
Amino Acid versus Residue
R
R
C
C
CO
N
COOH
H2N
H
H
H
Amino Acid Residue
13
Protein Identification via MW
  • MOWSE
  • http//srs.hgmp.mrc.ac.uk/cgi-bin/mowse
  • CombSearch
  • http//ca.expasy.org/tools/CombSearch/
  • Mascot
  • http//www.matrixscience.com/search_form_select.ht
    ml
  • AACompSim/AACompIdent
  • http//ca.expasy.org/tools/

14
Molecular Weight Proteomics
2-D Gel QTOF Mass Spectrometry
15
Amino Acid Frequency
  • Deviations greater than 2X average indicate
    something of interest
  • High K or R indicates possible nucleoprotein
  • High Cs indicate stable but hard-to-fold protein
  • High G, P, Q, or N says lack of stable structure

16
Isoelectric Point (pI)
  • The pH at which a protein has a net charge0
  • Q S Ni/(1 10pH-pKi)

Transcendental equation
17
Isoelectric Point
  • Calculation is only approximate (/- 1 pH)
  • Does not include 3o structure interactions
  • Can be used in developing purification protocols
    via ion exchange chromatography
  • Can be used in estimating spot location for
    isoelectric focusing gels
  • Can be used to decide on best pH to store or
    analyze protein

18
UV Spectroscopy
19
UV Absorptivity
  • UV (Ultraviolet light) has a wavelength of 200 to
    400 nm
  • Most proteins and peptides (and all nucleic
    acids) absorb UV light quite strongly
  • UV spectroscopy is the most common form of
    spectroscopy performed today
  • UV spectra can be used to identify or classify
    some proteins or protein classes

20
UV Absorptivity
  • OD280 (5690 x W 1280 x Y)/MW x Conc.
  • Conc. OD280 x MW/(5690 X W 1280 x Y)

OH
N
21
Hydrophobicity
  • Indicates Solubility
  • Indicates Stability
  • Indicates Location (membrane or cytoplasm)
  • Indicates Globularity or tendency to form
    spherical structure

22
Hydrophobicity
  • Average Hydrophobicity AH S AAi x Hi
  • Hydrophobic Ratio RH S H(-)/S H()
  • Hydrophobic Ratio RHP philic/phobic
  • Linear Charge Density LIND (KRDEH2)/
  • Solubility SOL RH LIND - 0.05AH
  • Average AH 2.5 2.5 Insol gt 0.1 Unstrc lt
    -6
  • Average RH 1.2 0.4 Insol lt 0.8 Unstrc gt
    1.9
  • Average RHP 0.9 0.2 Insol lt 0.7 Unstrc gt 1.4
  • Average LIND 0.25 Insol lt 0.2 Unstrc gt 0.4
  • Average SOL 1.6 0.5 Insol lt 1.1 Unstrc gt 2.5

23
Different Types of Features
  • Composition Features
  • Mass, pI, Absorptivity, Hydrophobicity
  • Sequence Features
  • Active sites, Binding Sites, Targeting, Location,
    Property Profiles, 2o structure
  • Structure Features
  • Supersecondary Structure, Global Fold, ASA,
    Volume

24
Sequence Features
AHGQSDFILDEADGMMKSTVPN HGFDSAAVLDEADHILQWERTY
GGGNDEYIVDEADSVIASDFGH LIVMLIVMDEADLIVM
LIVM (EIF 4A ATP DEPENDENT HELICASE)
25
Sites that Support Pattern Queries
  • OWL Database
  • http//umber.sbs.man.ac.uk/dbbrowser/OWL/
  • PIR Website
  • http//pir.georgetown.edu/pirwww/search/patmatch.h
    tml
  • SCNPSITE at EXPASY
  • http//ca.expasy.org/tools/scanprosite/
  • PattinProt
  • http//npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?p
    agenpsa_pattinprot.html/

26
Regular Expressions
  • CACGT - Matches CAT, CCT and CGT only
  • C . T - Matches CAT, CaT, C1T, CXT, not CT
  • CA?T - Matches CT or CAT only
  • CT - Matches CT, CCT, CCCT, CCCCT
  • C(HE)?ATP - Matches CHEAT, CAT, CHEAP, CAP
  • SA-I,L-Q,T-Z?LKA-I,L-Q,T-Z?A - Matches SLKA

27
PROSITE Pattern Expressions
C - ACG - T - Matches CAT, CCT and CGT only C -
X -T - Matches CAT, CCT, CDT, CET, etc. C - A
-T - Matches every CXT except CAT C - (1,3) - T -
Matches CT, CCT, CCCT C - A(2) - TP - Matches
CAAT, CAAP LIV - VIC - X(2) - G - DENQ - X
- LIVFM (2) -G
28
Sequence Feature Databases
  • PROSITE - http//ca.expasy.org/prosite/
  • BLOCKS - http//www.blocks.fhcrc.org/
  • DOMO - http//www.infobiogen.fr/services/domo/
  • PFAM - http//pfam.wustl.edu
  • PRINTS - http//www.bioinf.man.ac.uk/dbbrowser/PRI
    NTS/
  • SEQSITE - PepTool

29
Phosphorylation Sites
pY
pT
pS
PO4
PO4
CH3
PO4
30
Phosphorylation Sites
31
Signaling Sites
32
Protease Cut Sites
33
Binding Sites
34
Family Signature Sequences
35
Enzyme Active Sites
36
Better Methods for Sequence Feature ID
  • Sequence Profiles/Scoring Matrices
  • Neural Networks
  • Hidden Markov Models
  • Bayesian Belief Nets
  • Reference Point Logistics

37
What Can Be Predicted?
  • O-Glycosylation Sites
  • Phosphorylation Sites
  • Protease Cut Sites
  • Nuclear Targeting Sites
  • Mitochondrial Targ Sites
  • Chloroplast Targ Sites
  • Signal Sequences
  • Signal Sequence Cleav.
  • Peroxisome Targ Sites
  • ER Targeting Sites
  • Transmembrane Sites
  • Tyrosine Sulfation Sites
  • GPInositol Anchor Sites
  • PEST sites
  • Coil-Coil Sites
  • T-Cell/MHC Epitopes
  • Protein Lifetime
  • A whole lot more.

38
Cutting Edge Sequence Feature Servers
  • Membrane Helix Prediction
  • http//www.cbs.dtu.dk/services/TMHMM-2.0/
  • T-Cell Epitope Prediction
  • http//syfpeithi.bmi-heidelberg.com/scripts/MHCSer
    ver.dll/home.htm
  • O-Glycosylation Prediction
  • http//www.cbs.dtu.dk/services/NetOGlyc/
  • Phosphorylation Prediction
  • http//www.cbs.dtu.dk/services/NetPhos/
  • Protein Localization Prediction
  • http//psort.nibb.ac.jp/

39
Subcellular Localization
http//www.cs.ualberta.ca/bioinfo/PA/Sub/
40
Profiles Motifs are Useful
  • Helped identify active site of HIV protease
  • Helped identify SH2/SH3 class of STPs
  • Helped identify important GTP oncoproteins
  • Helped identify hidden leucine zipper in HGA
  • Used to scan for lectin binding domains
  • Regularly used to predict T-cell epitopes

41
Amino Acid Property Profiles
42
Amino Acid Property Profiles
  • Intent is to predict proteins physical
    properties directly from sequence as opposed to
    composition or wet chemistry
  • Offers a more detailed, graphical view of
    sequence-specific properties than compositional
    analysis (more powerful?)
  • Underlying assumption is amino acid properties
    are additive

43
Common Property Profiles
  • Hydrophobicity (Watch Scales!)
  • Helical Wheel (Not a True Profile)
  • Hydrophobic Moments (Helix Beta sheet)
  • Flexibility (Thermal B Factors)
  • Surface Accessibility (ASA)
  • Antigenicity (B-cell epitopes/T-cell epitopes)

44
Hydrophobicity Profile
  • Plotted using ltHgti S Hn/(2k 1)
  • Shows location of membrane spanning regions,
    epitopes, surface exposed AAs, etc.

45
Flexibility
  • B factors from X-ray crystallography
  • Potentially identifies antigenic and active sites
    from sequence data alone

46
Membrane Spanning Regions
47
Predicting via Hydrophobicity
Bacteriorhodoposin OmpA
48
Predicting via Hydrophobicity
49
Predicting via Neural Nets
  • PHDhtm http//cubic.bioc.columbia.edu/predictpro
    tein/submit_adv.html
  • TMAP http//www.mbb.ki.se
    /tmap/index.html
  • TMPred http//www.ch.embnet.org/software/TMPRED
    _form.html

ACDEGF...
50
Secondary Structure
51
Secondary Structure Prediction
52
Secondary Structure Prediction
  • Statistical (Chou-Fasman, GOR)
  • Homology or Nearest Neighbor (Levin)
  • Physico-Chemical (Lim, Eisenberg)
  • Pattern Matching (Cohen, Rooman)
  • Neural Nets (Qian Sejnowski, Karplus)
  • Evolutionary Methods (Barton, Niemann)
  • Combined Approaches (Rost, Levin, Argos)

53
Chou-Fasman Statistics
54
Prediction Performance
55
Best of the Best
  • PredictProtein-PHD (72)
  • http//cubic.bioc.columbia.edu/predictprotein
  • Jpred (73-75)
  • http//www.compbio.dundee.ac.uk/www-jpred/
  • PSIpred (77)
  • http//bioinf.cs.ucl.ac.uk/psipred/
  • Proteus (88)
  • http//129.128.185.1848080/proteus/

56
Sample Exam Questions
  • Here is the sequence for protein X, calculate its
    molar absorptivity
  • Here is the sequence for protein Y, try to locate
    the likely membrane spanning regions explain
    your reasoning
  • Here is the sequence for protein Z, show the
    tryptic cleavage points
Write a Comment
User Comments (0)
About PowerShow.com