Shan Sundararaj - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Shan Sundararaj

Description:

supernatant. transfer. supernatant. transfer ... supernatant. pellet. Extraction with. SDS/EDTA. Organelle. Membranes. Nuclear. Cytoskeletal (in SDS) ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 47
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: Shan Sundararaj


1
Protein Subcellular Localization
  • Shan Sundararaj
  • University of Alberta
  • Edmonton, AB
  • ss23_at_ualberta.ca

2
Why is Localization Important?
  • Function is dependent on context
  • Co-localization of proteins of related function
  • Valuable annotation for new proteins
  • Design of proteins with specific targets
  • Drug targeting
  • Accessibility
  • Membrane-bound gt cytoplasmic gt nuclear

3
Why is Localization Important?
  • 1999 Nobel Prize in Physiology/Medicine given to
    Günter Blobel
  • for the discovery that proteins have intrinsic
    signals that govern their transport and
    localization in the cell

4
Bacteria
Gram Positive (3-4 states)
Gram Negative (5 states)
Extracellular
cytoplasm
cytoplasm
periplasm
cytoplasmic membrane
cytoplasmic membrane
cell wall
outer membrane
Extracellular
5
Eukaryotic Cell
  • Compartmentalized
  • Diverse range of specific organelles
  • Plants chloroplasts, chromoplasts, other
    plastids
  • Muscle sarcoplasm
  • Various endosomes, vesicles

(modified from Voet Voet, Biochemystry
Wiley-VCH 1992)
6
Yet more categories
Chloroplast
Mitochondrion
Yeast specific
7
Level of Annotation
  • As simple as two states
  • membrane protein vs. non-membrane protein
  • secreted protein vs. non-secreted protein
  • Gross compartments
  • cytoplasm, inner membrane, periplasm, cell wall,
    outer membrane, extracellular
  • nucleus, mitochondria, peroxisome, vacuole
  • Fine compartments
  • Mitochondrial matrix, bud neck, spindle pole
  • Any of 1425 GO cellular compartments

8
Localization signaling
  • Proteins must have intrinsic signals for their
    localization a cellular address
  • E.g. N-terminal signal sequences

321 Nuclear Inner Membrane Lane Nucleus,
Intracellular county Eukaryotic Cell CL34V3M3
9
Localization signaling
  • Some signals are easily recognizable
  • Signal peptidase cleavage site, consensus
    sequence for secretion ? extracellular
  • Address printed neatly, postal code
  • Others are difficult to understand
  • Outer membrane b-barrel proteins, no consensus
    sequence, few sequence restraints
  • Sloppy address, different kind of code that we
    dont understand yet

10
Experimental determination
  • Since dont fully understand the language of
    proteins, our knowledge must often come from
    inference
  • Predicting localization is like sorting mail
    based only on examples of where some mail has
    gone before
  • Important to have good data sets of proteins with
    known localizations

11
Datasets
  • DBSubLoc (http//www.bioinfo.tsinghua.edu.cn/guot
    ao/download.html)
  • Combines SwissProt and PIR localization
    annotations (64051 proteins!)
  • PSORT-B
  • (http//www.psort.org/dataset/)
  • 1591 Gram ve proteins, 576 Gram ve proteins
  • SignalP
  • (http//www.cbs.dtu.dk/ftp/signalp/)
  • 940 plant and 2738 human proteins
  • YPL
  • (http//bioinfo.mbb.yale.edu/genome/localize/)
  • 2956 yeas proteins

12
Experimental Methods
  • Electron microscopy
  • GFP tagging / fluorescence microscopy
  • Subcellular fractionation detection
  • Western blotting
  • Mass spectrometry

13
Electron Microscopy
  • Highest resolution, can work at the level of a
    single protein complex
  • Immunolabel proteins of interest in conjunction
    with colloidal gold, and visualize
  • Combined with electron tomography, can even
    visualize unlabeled complexes

(from Koster and Klumperman, Nat Rev Mol Cell
Biol, Sep 2003, S6-10)
14
Fluorescence Microscopy
  • Tag gene at either 3 or 5 end
  • Using GFP (or RFP, YFP, CFP, etc.)
  • Using an epitope tag and a fluorescently labeled
    antibody
  • Careful of removing signal peptides!
  • Also use a subcellular-specific marker or stain
  • Visualize with confocal fluorescence microscopy
    and analyze images for co-localization

15
Specific co-labeling (yeast)
  • Early GolgiCop1
  • Endosome Snf7
  • ER to Golgi Sec13
  • Golgi apparatus Anp1
  • Late Golgi Chc1
  • Lipid particle Erg6
  • Mitochondrion MitoTracker
  • Nucleus DAPI
  • Nucleolus Sik1
  • Nuclear periphery Nic96
  • Peroxisome Pex3
  • Vacuole FM4-64

Nuclear-specific DAPI staining
16
Subcellular Fractionation
transfer supernatant
transfer supernatant
transfer supernatant
1000 g
10,000 g
100,000 g
Pellet microsomal Fraction (ER,
golgi, lysosomes, peroxisomes)
Pellet unbroken cells nuclei chloroplast
Pellet mitochondria
Super. Cytosol, Soluble enzymes
tissue homogenate
17
Detergent Fractionation
Cells
Extraction with Digitonin/EDTA
supernatant
pellet
Extraction with TritonX100/EDTA
Cytoplasmic Fraction
Extraction with SDS/EDTA
Organelle Membranes
Nuclear
Cytoskeletal (in SDS)
18
Fractionation ? Identification
  • Once fractionated, take compartment of interest
    and separate proteins
  • 2D gel or chromatography
  • Identify separated proteins
  • Mass spectrometry for high-throughput
  • Western blot for specific proteins

19
Fractionation in proteomics
20
Recent High-Throughput Exp.
  • Kumar et al., Genes Dev 2002, 16707-719
  • Epitope-tagged gt60 of ORFs, visualized with
    fluorescently labeled antibody
  • 2744 localizations (44 of S. cerevisiae genes)
  • Huh et al., Nature 2003, 425686-691
  • GFP tagged all ORFs, RFP tagged compartments
  • 4156 localizations (75 of S. cerevisiae genes)
  • Combined, now nearly 87 of yeast proteins have a
    localization annotation

21
Predictions from known data
  • Enough experimental data exists to build highly
    accurate computational predictors of localization

22
Predictions from known data
  • Different information used for predictions
  • Sequence motifs
  • N-terminal secretory signal peptides,
    mitochondrial targeting peptide, chloroplast
    transit peptide
  • C-terminal peroxisome import signal, ER
    retention signal
  • Mid-sequence nuclear localization signals
  • Amino acid composition
  • AA frequency, dipeptide composition.
  • Homology
  • - Sequence comparison to proteins of known
    localization

23
N-terminal signal peptides
24
N-terminal signal peptides
  • Common structure of signal peptides
  • positively charged n-region, followed by a
    hydrophobic h-region and a neutral but polar
    c-region.

25
More work to do
  • Multiple bacterial secretion pathways
  • C-terminal signal peptides
  • Internal mitochondrial transit peptides
  • Structural aspects of targeting
  • Gene re-localization
  • Still a lot to discover in how signaling works!

26
Computational methods for predicting localization
  • Expert rule based methods
  • Artificial Neural Nets (ANN)
  • Hidden Markov Models (HMM)
  • Naïve Bayes (NB)
  • Support Vector Machines (SVM)
  • Combination of above methods

27
Naïve Bayes
  • Assumption
  • Features are conditionally
  • independent, given class labels
  • Structure
  • 1 level tree
  • Class labels root
  • Features leaf nodes
  • Prediction
  • class(f) argmax P(Cc)P(Ff Cc)
  • c

28
Artificial Neural Network
  • Excellent for modeling non-linear input/output
    relationships
  • Robust to noise in training data
  • Widely used in bioinformatics

29
Support Vector Machines
  • Input vectors are separated into positive vs.
    negative instance
  • Map to new feature space
  • Find hyperplane that best separates the two
    classes by distance

30
Evaluating Predictors - Precision
Predicted
True
  • of proteins correctly labeled as cyt divided
    by the total of proteins labeled as cyt
  • How often the label is correct
  • If there are 90 proteins correctly labeled as
    cyt, and 10 proteins incorrectly labeled as
    cyt, then the precision is 90/100 0.90.

31
Evaluating Predictors - Sensitivity
Predicted
True
  • of proteins correctly labeled as cytoplasmic
    divided by the total of proteins that are
    cytoplasmic
  • How many of the true results were retrieved
    (also called recall or accuracy)

32
Predictions from known data
  • Different information used for predictions
  • Sequence motifs
  • N-terminal secretory signal peptides,
    mitochondrial targeting peptide, chloroplast
    transit peptide
  • C-terminal peroxisome import signal, ER
    retention signal
  • Mid-sequence nuclear localization signals
  • Amino acid composition
  • AA frequency, dipeptide composition.
  • Homology
  • - Sequence comparison to proteins of known
    localization

33
TargetP, SignalP, Phttp//www.cbs.dtu.dk/service
s/
  • Sequence-based methods
  • TargetP (85-90 recall)
  • Predicts mitochondria/chloroplast/secreted
  • Contains SignalP and ChloroP
  • LipoP
  • lipoproteins and signal peptides in Gram negative
    bacteria
  • SecretomeP
  • non-classical secretion in eukaryotes

34
SignalP result
Cleavage site
Prediction Signal peptide Signal peptide
probability 0.945 Signal anchor probability
0.000 Max cleavage site probability 0.723
between pos. 28 and 29
35
Organellar Prediction
  • Predotar (http//www.inra.fr/predotar/) (80
    recall)
  • Mitochondrial and plastid sequences N-terminal
    sequences
  • MitoPred (http//mitopred.sdsc.edu/)
  • Mitochondrial PFAM domains, AA composition
  • MitoProteome (http//www.mitoproteome.org/)
  • Database of experimentally predicted human
    mitochondrial
  • MitoP (http//ihg.gsf.de/mitop2/)
  • Combines data from multiple experimental and
    computational sources to give a consensus score
    for each mitochondrial protein in yeast and
    human

36
The PSORT Family
  • PSORT plant sequences
  • PSORT II eukaryotic sequences
  • iPSORT eukaryotic N-term. signal sequences
  • PSORT-B bacterial sequences

37
PSORT-Bhttp//www.psort.org/psortb/
38
PSORT-B - methods
  • Signal peptides Non-cytoplasmic
  • AA composition/patterns
  • SVMs trained for each location vs. all other
    locations
  • Transmembrane helices Inner membrane
  • HMMTOP
  • PROSITE motifs all localizations
  • Outer membrane motifs Outer membrane
  • Homology to proteins of known localization
  • SCL-BLAST

Integration with a Bayesian network
39
PSORT-B results
  • SeqID Unannotated_bacterial2
  • Analysis Report
  • CMSVM- Unknown No details
  • CytoSVM- Cytoplasmic No details
  • ECSVM- Unknown No details
  • HMMTOP- Unknown No internal
    helices found
  • Motif- Unknown No motifs
    found
  • OMPMotif- Unknown No motifs
    found
  • OMSVM- Unknown No details
  • PPSVM- Unknown No details
  • Profile- Unknown No matches
    to profiles found
  • SCL-BLAST- Cytoplasmic matched
    118438 Cyto. protein
  • SCL-BLASTe- Unknown No matches
    against database
  • Signal- Unknown No signal
    peptide detected
  • Localization Scores
  • Cytoplasmic 9.97
  • CytoplasmicMembrane 0.01
  • Periplasmic 0.01
  • OuterMembrane 0.00

40
Proteome Analysthttp//www.cs.ualberta.ca/bioinf
o/PA/Sub/
41
Proteome Analyst - Method
42
Proteome Analyst - Feature Extraction
43
Proteome Analyst Feature Extraction
  • TOP 3 Homologs
  • ? AFP1_ARATH
  • AFP1_BRANA
  • AFP2_ARATH
  • KW
  • Plant defense Fungicide
  • Signal Multigene Family
  • Pyrrolidone carboxylic acid
  • DR InterPro
  • IPR002118 IPR003614
  • CC Subcellular location
  • Secreted
  • Token Set

Plant defense Fungicide Signal Multigene
Family Pyrrolidone carboxylic acid IPR002118
IPR003614 Secreted
44
PASub - Results
Contribution of each token
Log scale
Features
45
PASub - Interpretation
  • Bars represent -log probability, so a little
    difference is a lot!
  • Naïve Bayes chosen as classifier because of
    transparency of method
  • Each token gives a probability that can be summed
    and shown graphically
  • Neural network actually has higher recall
  • Can change token set, ask to explain with
    different features

46
Save Time Pre-computed Genomes
  • PSORT-B 2.0
  • http//www.psort.org/genomes/
  • 103 Gram ve bacteria, 45 Gramve bacteria
  • Proteome Analyst
  • http//www.cs.ualberta.ca/bioinfo/
  • Human, mouse, fly, yeast, Plasmodium falciparum,
    E. coli, B. subtilis
Write a Comment
User Comments (0)
About PowerShow.com