Linear motifs and phosphorylation sites

About This Presentation

Title:

Linear motifs and phosphorylation sites

Description:

Linear motifs and phosphorylation sites Substrate recruitment is one of the known specificity mechanisms The protein composition around the phosphorylatable site is ... – PowerPoint PPT presentation

Number of Views:104

Avg rating:3.0/5.0

Slides: 84

Provided by: AV3

Category:

more less

Transcript and Presenter's Notes

Title: Linear motifs and phosphorylation sites

1
Linear motifs and phosphorylation sites
2
What is a linear motif? (in molecular biology)
3
a first taste
Short sequence of amino acids encoding a
particular molecular function
Functional sites
Linear Motifs
We need a more accurate definition!
4
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
5
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
6
Tyrosine kinsase Src has several functional sites
7
p53 is full of functional sites
CYCLIN
MDM2
NES
TAFII31
CBP
P300
S100B
NLS
SIR2
P300
Pin1 P-Ser-Pro isomerisation
Acetylation
SUMO
Ubiquitinylation
phosphorylation
8
The sequences of many proteins contain short,
conserved motifs that are involved in recognition
and targeting activities, often separate from
other functional properties of the molecule in
which they occur.
These motifs are linear, in the sense that
three-dimensional organization is not required to
bring distant segments of the molecule together
to make the recognizable unit.
Tim Hunt (TIBS 1990)
9
The conservation of these motifs varies some are
highly conserved while others, for example, allow
substitutions that retain only a certain pattern
of charge across the motif.
Tim Hunt (TIBS 1990)
10
A more accurate definition

short, common stretches of polypeptide chains (
3-10 amino acid residues long)

embody a distinct molecular function independent
of a larger sequence/structure context.

bind with low affinity (1.0-150 ?M). Mediate
transient interactions.

are nearly always involved in regulation

are involved in protein/domain-protein/domain
interactions

often reside in disordered or low-complexity
regions

often become ordered upon binding to another
protein or domain

occurrences of LMs seem to arise or disappear as
a result of point mutations

11
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
12
Why are they important?
Evolutionary unrelated protein sharing a
functional feature are likely to contain similar
linear motifs
This may be the result of - convergent
evolution - evolutionary conservation in a
divergent evolution process
In any case, linear motifs are indicative of
functions
In other words
They are made up of the amino acid residues
encoding a functional site
With the appropriate tools, they can be used to
identify protein functions functional
regions (in a protein sequence and on its
three- dimensional structure, if available)
13
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
14
Can we classify LMs? How?
15
Can we classify LMs? How?
Functional site (Linear Motif)
Functional group
16
PRACTICE Lets find linear motifs in human p53
Go to the UniProt website http//www.uniprot.org/
Type p53 in the Query text box and select P04637
or
Type directly either P04637 or P53_HUMAN in the
Query text box
Work in groups and analyse the p53 entry record

how many LMs can you identify?
which function(s) are they indicative of?
are they always annotated as motif?
can you classify them according to the 4
categories?

17
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
18
How can we represent LMs?
Alignment of cyclin ligands
inhibitors
Regular expression RK.L.0,1FLIV
19
How can we represent LMs?
Alignment of cyclin ligands
inhibitors
Regular expression RK.L.0,1FLIV
20
Regular Expression (regexp)
L single amino acid L Leucine KR
different amino acids allowed at this position x
or . wildcard 0,1 variable length
21
Regular Expression Examples
22
Before we describe what regexp are useful for,
lets briefly see how to discover de novo motifs
In some cases, the structure and function of an
unknown protein which is too distantly related to
any protein of known structure to detect its
affinity by overall sequence alignment may be
identified by its possession of a particular
cluster of residues types classified as a motifs.
The motifs, or templates, or fingerprints, arise
because of particular requirements of binding
sites that impose very tight constraint on the
evolution of portions of a protein sequence
Arthur Lesk, 1988
23
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
24
In contrast to domains, which are readily
detectable by sequence comparison, linear motifs
are difficult to discover due to their short
length, a tendency to reside in disordered
regions in proteins, and limited conservation
outside of closely related species.
Neduva et al. PLoS Biology 2005
25
De novo Linear Motif discovery
? Study literature paper(s)/review(s) on a group
of unrelated proteins sharing a function
? Build an alignment of these proteins
? Add to the alignment other sequences relevant
to the subject under consideration
? Pay attention to the residues and regions
thought or proved to be important to the
biological function of that group of proteins
enzyme catalytic sites PTM sites regions
involved in binding
? Try to find a short conserved sequence which
includes functionally important residues
26
Discovery of de novo Linear Motif
There are algorithms that do it automatically
Neduva et al. PLoS Biology 2005
27
Discovery of de novo Linear Motif
Our central hypothesis is that proteins with a
common interaction partner will share a feature
that mediates binding, either a domain or a
linear motif. In the absence of a shared domain,
a linear motif could well be the only common
sequence feature and might thus be detectable
simply by virtue of over-representation, which is
the basis of our approach.
Neduva et al. PLoS Biology 2005
28
A probabilistic method for identifying
over-represented, convergently evolved, short
linear motifs in proteins.
Edwards et al. PLoS ONE 2007
29
PRACTICE Discovery of de novo Linear Motifs
Dilimot
http//dilimot.russelllab.org/
SLIMFinder
http//www.southampton.ac.uk/re1u06/software/slim
finder/
30
What are you going to learn about Linear Motifs?
Where can we find them?
Why are they important?
Can we classify them?
How can we represent them?
How can we discover them?
When and how can we use them?
What are tools and resources to handle them?
31
Linear Motif Databases
ELM
PROSITE
R.RK1,2.R
R-x-RK-x(1,2)-R
1632 documentation entries (domains and
functional sites)
174 manually annotated motifs
16-03-2012
32
What regular expressions are useful for?
How can we use regular expressions?
Regular expressions can be used to search for
motif occurrences in (uncharacterised) protein
sequences
There are algorithms that do this for us
We call the occurrence of a motif in a sequence
an INSTANCE of that motif
A motif (a regexp) can have many instances
KKVAVVRTPPKSPSSAKSRL ISPPTPKPRPPRPLPVAPGS EDQILKKP
LPPEPAAAPVST SHRKTKKPLPPTPEEDQILK TRICKIYDSPCLPEAE
AMFA
TAU_HUMAN P85A_HUMAN BTK_HUMAN BTK_HUMAN RAD51_HUM
AN
SH3 ligand motif
RKY..P..P
33
Prediction of new instances of Linear Motifs
INPUT a protein sequence OUTPUT PROSITE or
user-defined motif matches in the input sequence
ScanProsite
Allows the search for user-defined regular
expressions
INPUT a protein sequence OUTPUT scansite motif
matches in the input sequence
Scansite
INPUT a protein sequence OUTPUT ELM motif
matches in the input sequence
ELM
INPUT a protein sequence OUTPUT MiniMotifMiner
motif matches in the input sequence
MiniMotifMiner
34
PRACTICE Prediction of new instances of Linear
Motifs
Go to the ScanProsite website and search for the
RGD motif in the SwissProt database
http//prosite.expasy.org/scanprosite/
R-G-D
Select database
How many hits? How many hits are expected by
chance?
35
Regular expression pros and cons
Unfortunately matches to these motifs are not
significant, providing a signal-to-noise problem
for bioinformatics tools
Advantages Disadvantages
Memorable to humans Over determined
Computationally fast Motif may vary in other lineages
Standardised in scripting languages (Python, Perl) Do not capture weaker preferences
Often, they can descrive a motif very well Easy to make a poor representation
36
Overprediction and context information
37
Functional sites only work in proper context
The cell knows how to discriminate TP from FP !!!
Knowledge of context can provide the basis for
filters for improved prediction of functional
sites
38
For example
39
Globular domain filter
Motifs are mostly found in disordered regions
The disordered regions are proving to be rich in
Linear Motifs
Src kinase
We can exploit this observation and filter out
motif matches inside domains
40
Structural Filter
Motif matches are not ALWAYS outside domains
Inside domains they are unlikely unless in
surface loops
When inside a domain, a motif match is more
likely to be a True Positive (TP) if it occurs in
a flexible (i.e. loop, turn or linker) and
accessible region of the domain
41
The RGD motif is recognized by different members
of the integrin family
An exposed instance of the RGD motif in a domain
An instance of the RGD motif in a region outside
a domain
42
MOD_N-GLC_1 (.(N)PST..) is a motif for
N-glycosilation site
Two MOD_N-GLC_1 motifs in a domain
43
Structural Filter
We can think to implement a filter that is based
on the three-dimensional features of motifs (i.e.
their accessibility and secondary structure types)
If the match is not accessible
low score
If the match is in ?-helix
low score
low score
If the match is in ?-strand
44
Other features that can be used to filter out
FPs

Taxonomy
Cellular compartment
Evolutionary conservation

Davey NE et al. Mol Biosyst 2011
45
Why is a Conservation Score useful for linear
motif prediction?
Improve the prediction of LM instances by
discarding those matches that are unlikely to be
functional because they have not been conserved
during the evolution of the protein sequences
46
There is a resource which implements these filters
It associates a score to occurrences of motifs
based on

Cellular context
Molecular context
Domain context
Disorder
Taxonomy
Evolutionary conservation

47
The Eukaryotic Linear Motif (ELM) Resource
implements a logical filtering system to reduce
false matches
48
The Eukaryotic Linear Motif (ELM) Resource

Repository of information about functional sites
(including experimentally reported instances)

A motif-based query tool to find possible new
functional sites

A logical filtering system to reduce false
matches

49
The ELM Resource - An overview
50
PRACTICE The ELM server (http//elm.eu.org/)
Go to the ELM server
Search for motif matches in the EH domain-binding
mitotic phosphoprotein
51
Output 1
instance in structurally unfavourable context
annotated instance
Instance in unfavourable context
highly conserved instance
52
Output 2
53
Output 2
54
Browse the ELMs page for the Clathrin Box motif
in Endocytosis cargo adaptor proteins (ELM
LIG_AP2alpha_2)
55
Link to reported instances
56
(No Transcript)
57
(No Transcript)
58
Exploring unknown protein sequences
59
(No Transcript)
60
Phosphorylation sites
61
Phosphorylation is the addition of a phosphate
group (PO4) to a protein molecule or small
molecule.
The hydroxyl groups (-OH) of SER, THR or TYR
residues side chain are the most common targets
62
Reversible protein phosphorylation
ATP (adenosine triphosphate) is the energy
currency of the living world. Every cellular
process that requires energy gets it from ATP

It is rapid (few seconds)
It is easily reversible

63
Reversible protein phosphorylation regulates most
aspects of cell life
64
Phosphorylation is a Post Translational
Modification (PTM)
A kinase recognises its substrate and adds a
phosphate group (PO4) to one of its residues,
typically a Serine (Ser, S), Threonine (Thr, T),
or Tyrosine (Tyr, Y)
Amino acid phosphorylation is probably the
most abundant of the intracellular PTMs used to
regulate the state of eukaryotic cells, with
estimates ranging up to 500,000 phosphorylation
sites in the human proteome
65
Nevertheless
Substrate recognition is specific
In other words
Each kinase is capable of recognising its
substrate(s) in the cell
In fact, the enzymes must be specific and act
only on a defined subset of cellular targets to
ensure signal fidelity.
Even though the determinants of specificity are
still unclear
66
Substrate recruitment is one of the known
specificity mechanisms
The protein composition around the
phosphorylatable site is another factor
Kinases are capable of recognising the region
surrounding the phosphoacceptor residue (in
sequence and/or in structure)
In fact, kinases do not phosphorylate every Ser,
Thr, Tyr they encounter in the cell
Kreegipuu et al, NAR 1998
67
A phosphorylation site can be represented by a
phosphorylation motif
Experimentally verified phosphorylation motifs
can be used to predict new phosphorylation sites
and characterise kinase substrates
68
There are many resources collecting P-sites and
many tools to predict P-sites in user-defined
protein sequences
Collection of instances of P-sites Prediction of new instances of P-sites
Phospho.ELM phospho.elm.eu.org/ Phospho.ELM phospho.elm.eu.org/
PhosphoSitePlus www.phosphositePlus.org/ Scansite scansite.mit.edu/
PHOSIDA www.phosida.com/ NetPhos www.cbs.dtu.dk/services/NetPhos/
PHOSPHORYLATION SITE DATABASE www.phosphorylation.biochem.vt.edu/ NetPhosK www.cbs.dtu.dk/services/NetPhos/
Phospho.3D www.phospho3d.org/ NetworKIN networkin.info/search.php
KinasePhos KinasePhos.mbc.nctu.edu.tw/
Predikin predikin.biosci.uq.edu.au/
69
Phospho.ELM phospho.elm.eu.org
Database of experimentally veri?ed
phosphorylation sites in eukaryotic proteins
Current release contains 42,914
instances (fully linked to literature
references) 299 kinases 11,224
sequences 8,698 substrates
70
PRACTICE Go to the Phospho.ELM website and search
P-sites for p53
71
(No Transcript)
72
(No Transcript)
73
ELM and Phospho.ELM are interconnected
74
PhosphoBlast
75
(No Transcript)
76
Structural information on P-sites and 3D scan
77
Phospho.3D
http//www.phospho3d.org/
PRACTICE Go to the Phospho.3D website and search
all the substrates of the Src kinase
78
(No Transcript)
79
(No Transcript)
80
(No Transcript)
81
Suggestions to predict P-sites in unknown
sequences
MEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEF
FEGPSEALRVSGAPAAQDPVTETPGPVAPAPATPWPLSSFVPSQKTYQGN
YGFHLGFLQSGTAKSVMCTYSPPLNKLFCQLAKTCPVQLWVSATPPAGSR
VRAMAIYKKSQHMTEVVRRCPHHERCSDGDGLAPPQHLIRVEGNLYPEYL
EDRQTFRHSVVVPYEPPEAGSEYTTIHYKYMCNSSCMGGMNRRPILTIIT
LEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEVLCPELPPGSAKR
ALPTCTSASPPQKKKPLDGEYFTLKIRGRKRFEMFRELNEALELKDAHAT
EESGDSRAHSSYLKTKKGQSTSRHKKTMVKKVGPDSD
?
82
Exploring unknown protein sequences
Go to UniProt (or Blast your sequence against
the UniProt database) and explore the
sequence annotation
Go to Phospho.ELM and scan the sequence
Go to PHOSIDA and PhosphoSitePlus and do the
same
Use different predictors and select only high
scoring sites
Use evolutionary information - is the site
conserved?
Use domain (SMART and Pfam) databases - is
the site inside a domain?
Use structural information if available -
is the site exposed? - is it in a flexible
region?
83
Exploring unknown protein sequences
When all information is collected, only retain
sites predicted by more than one tool
Amongst these, for further experimental tests,
preferably choose sites that are