Title: Dressing up Protein Sequences with Bioinformatics Data
1Dressing up Protein Sequences with
Bioinformatics Data
Cédric Notredame
2(No Transcript)
3Before We Start
4(No Transcript)
5Important Things We will not Talk about here !!!
6Finding Out Simple Things About Your Protein
7Doing Biochemistry in your Computer
8Finding Out Simple things On Expasy
9ExpasyWhere it all starts
10Expasy is a MAJOR service
11Using ProtParam
12Using ProtParam
13Using ProtParam
14Using ProtParam
15Using ProtParam What it does not do for you!
16Comparing Your Sequence With Itself
17Comparing Your Protein With Itself
-Does my protein contain a repeated domain
? -Does my protein contain low complexity
segments?
18The EMBnet Server
19Dotlet Before Starting
20Dotlet Sequence Input
21Dotlet Getting it to work!
22Dotlet Getting it to work!
Disapointing !!!
23Dotlet Getting it to work!
24Dotlet Getting it to work!
Tunning Dotlet The Right Knob at The Right
Moment
25Dotlet Fixing the Zoom Factor!
1
1.Set the ZOOMSo that the ENTIRE Protein
Appears
26Dotlet Choosing the right Window
2
2Set the Window SizeTo The domain size
-Repeat size 50 Window Size 51
27Dotlet Using The Threshold
Numberof Dots
Log Curve
Score
Threshold 1
Threshold 2
28Dotlet Using The Threshold
T2
Numberof Dots
BLACK
GREY
WHITE
Score
T1
29Dotlet Choosing the right Window
3
3Set the Threshold
30Dotlet Analyzing the Dotlet
Low ComplexityRegion
31Dotlet Analyzing your Dotlet
32Introducing our naked sequence
MALRAGLVLG FHTLMTLLSP QEAGATKADH MGSYGPAFYQ
SYGASGQFTH EFDEEQLFSV DLKKSEAVWR LPEFGDFARF
DPQGGLAGIA AIKAHLDILV ERSNRSRAIN VPPRVTVLPK
SRVELGQPNI LICIVDNIFP PVINITWLRN GQTVTEGVAQ
TSFYSQPDHL FRKFHYLPFV PSAEDVYDCQ VEHWGLDAPL
LRHWELQVPI PPPDAMETLV CALGLAIGLV GFLVGTVLII
MGTYVSSVPR
33Introducing our naked sequence
341-Simple predictions
2-Repeated regions
35Finding Out about the Secondary Structure of
Your Sequence Trans-Membranedomains
36Predicting Secondary Structures
-Does My Protein Contain A trans-membrane domain
?
37protscale sliding Window Methods
Average
Sliding Window
38protscale sliding Window Methods
39protscale sliding Window Methods
FREL_CANAL
40protscale sliding Window Methods
Window Size Tm Domain size
41protscale sliding Window Methods
42Protscale Making the right Interpretation
43Protscale Making the right Interpretation
44Protscale Making the right Interpretation
45Protscale Making the right Interpretation
46tmHMMpred The state of the art
47tmHMMpred The state of the art
48tmHMMpred The state of the art
491-Simple predictions
2-Repeated regions
3-Secondary Struc
50Predicting Secondary Structures
-Does My Protein Contain A Coiled-coil domain ?
51Using the EMBnet COILS server
52Using the EMBnet COILS server
53Finding Out about the Secondary Structure of
Your Sequence Helices and Beta-sheet
54Predicting Secondary Structures
-What is the secondary structure of my protein ?
55Running PsiPred
Your email
56Running PsiPred
57How Trustable Is PsiPred ?
58How Trustable Is PsiPred on my protein ?
Do you need to see the homologs?
59Finding Out if your Sequence Is Modified
60Post Translation Modifications
-Is my protein modified after its
translation -Phosphorilated -Glycolated -
61What is a Prosite pattern ?
-A PROSITE Pattern is a Protein WORD conserved in
many sequences
PVAILL
-PROSITE Patterns lets you identify protein
families or important features of your sequence
62What is a Prosite pattern ?
-A PROSITE Pattern Lets You identify a protein
family JUST LIKE the silver lady lets you
identify a certain brand of cars
63Prosite Patterns can describe complex signatures
RK-x-ST
-Reads as Followsan Arginine or a Lysine,
followed by one random residue, followed by a
Serine or a Threonine
64Prosite Patterns can describe complex signatures
C-DES-x-C-x(3)-I-x(3)-R-x(4)-P-x(4)-C-x(2)-C
Is a signature for Zn finger proteins that Bind
DNA
65Using PrositeScan
MALRAGLVLG FHTLMTLLSP QEAGATKADH MGSYGPAFYQ
SYGASGQFTH EFDEEQLFSV DLKKSEAVWR LPEFGDFARF
DPQGGLAGIA AIKAHLDILV ERSNRSRAIN VPPRVTVLPK
SRVELGQPNI LICIVDNIFP PVINITWLRN GQTVTEGVAQ
TSFYSQPDHL FRKFHYLPFV
66Using PrositeScan Short Patterns
67Using PrositeScan Short Patterns
68Using PrositeScan Short Patterns
We cannot do much with this weak but
exciting We can only REMEMBER and WAIT
69Using PROSITE-Scan Structure
70Other Means of Prediction Post Translational
Modifications
711-Simple predictions
2-Repeated regions
3-Secondary Struc
4-PROSITE motifs
72Identifying Domain
73The Idea of Domains
74The Three Major Resources For Searching Domain
Collections
75Interpro A Federation of Databases
76The Three Major Resources For Searching Domain
Collections
Domain servers ( and domain collections) are like
good Italian restaurant they all look similar,
but each of them is a bit special and has its own
recipes
77Finding Domains
-Is my Protein made of known domains ?
78Interpro The Idea of Domains
79Using InterPro Asking a question
80Using InterPro Asking a question
81Using CDsearch Asking a question
82Using CDsearch Asking a question
NCBI Domain
E-Values
83Finding Domains
-How can I be sure that the domain Prediction of
my Protein is real ?
84Using EMBNet PFscan
85Using EMBNet PFscan
Important Position that is Well conserved in our
sequence
861-Simple predictions
2-Repeated regions
4-PROSITE motifs
3-Secondary Struc
5-Conserved Domains
87Finding Homologues
88(No Transcript)
89(No Transcript)
90BLAST A summary
-BLAST compares YOUR SEQUENCE with ALL THE
SEQUENCES in a database -BLAST reports the
sequences that are the more similar to your
sequence -If your sequence is more than 30
identical to another sequence (over more than 100
residues) these two sequences probably have the
same origin, the same structure and related
biochemical functions.
91 Making Sense of it all with the 3D structure
92Using Structural Information
-What is the 3D structure of my protein ?
93Using Structural InformationBlasting against
PDB
94Finding Structural Homologues with BLAST
95Displaying Your Structure
96Displaying Your Structure
97Displaying Your Structure
98Displaying Your Structure
99Highlighting Interesting Bits
Concistency!!!
1001-Simple predictions
2-Repeated regions
4-PROSITE motifs
5-Conserved Domains
3-Secondary Struc
6-3D structures
101Finding out about the Genetic Environment of your
Protein
102Using Genomic Information
-Where is my protein in the human genome.-Where
are its homologues ?
103Using Genomic Information
104Blasting the Human Genome
105Blasting the Human Genome
106Blasting the Human Genome Finding Out about
the Neighborhood Of your Gene
107Blasting the Human Genome Finding Out about
the Neighborhood Of your Gene
108Blasting the Human Genome Finding Out about
the Neighborhood Of your Gene
109Blasting the Human Genome Finding Out about
the Neighborhood Of your Gene
110Blasting the Human Genome Finding Out about
the Neighborhood Of your Gene
111 Wrapping it up!
1121-Simple predictions
2-Repeated regions
4-PROSITE motifs
5-Conserved Domains
3-Secondary Struc
6-3D structures
7-Genomic Environment
1135-Conserved Domains
Two Domains
6-3D structures
Tm Domain
2-Repeated regions
3-Secondary Struc
ImmunologyRelated
5-Conserved Domains
7-Genomic Environment
114(No Transcript)