Title: ENCODE: finished pilot phase (Francesca Camilli, update
1Projects and collaborations
- ENCODE finished pilot phase (Francesca Camilli,
update) - Tress, M., Martelli, P.L., Frankish, A., Reeves,
G., Wesselink J.J., Yeats, C., Olason, P.I.,
Albrecht, M., Hegyi H., Giorgetti, A., Raimondo,
D., Lagarde, J., Laskowski, R., Lopez, G.,
Sadowski, M.I., Watson, J., Fariselli, P., Rossi,
I., Nagy, A., Kai, W., Stoerling, Z., Orsini, M.,
Assenov, Y., Blakenburg, H., Huthmacher, C.,
Ramirez, F., Schlicker, A., Denoued, F., Jones,
P., Kerrien, S., Orchard, S., Birney, E., Brunak,
S., Casadio, R., Guigo, R., Harrow, J.,
Hermjakob, H., Jones, D.T., Lengauer, T., Orengo,
C., Patthy, L., Thornton, J., Tramontano, A.,
Valencia, A. - The implications of alternative splicing in the
ENCODE protein complement . Proc. Natl. Acad.
Sci., In press - Server e web service for automatic protein
modelling Splicing variants analysis. At CRS4 in
collaboration with Mateo Floris e Massimiliano
Orsini - GCSF and beta-interferon collaboration with
BioKer and Maria Valentini (crs4) MD
simulations and protein-protein docking. Next
Final analysis of the bioinformatics part, and
results comparison with the experimental
counterpart. - ITCH collaboration with group of Prof. Gerry
Melino. Modelling, MD simulations, Normal Modes
analysis, phosphorylation sites analysis. Next
Experimental counterpart, new analysis of the
interface and design of new experiments. - Ion channels collaboration with C. Michelletti
and P. Carloni groups at SISSA. Loop modelling,
normal modes analysis. Next Analysis of how loop
conformations influence the channels. - Cancer project Cargo interface, modelling,
mutations mapping. Next check of automatic
models and mapping of mutations (annotation on
PMDB). - BABP protein collaboration with Henriette
Molinaris group. Bioinformatic searches. - Guariento M. , Raimondo D., Assfalg M., S.
Zanzoni S, Esente P. , Ragona L. ,Tramontano A.
and Molinari H . - Identification and functional characterization
of the bile acid transport proteins in
non-mammalian ileum and mammalian liver. Proteins
2007, in press - Glycodelin proteins collaboration with Henriette
Molinaris group. Modelling, protein-protein
interaction. Next Dimer interface analysis and
design of experiments.
2Cagliari 31-05-2007Project ITCH/p73
- Alejandro Giorgetti
- Domenico Raimondo
- Anna Tramontano
3p73
- Structural and functional homologue of p53, able
to transactivate the promoters of genes involved
in apoptosis and cell cycle regulation. - p53 is the most frequently mutated and intensely
studied tumor suppressor gene. - After DNA damage or proto-oncogene activation,
p53 is stabilized by ubiquitylation and exerts
its anti-tumorigenic activity by inducing cell
cycle arrest or apoptosis. - p73 p53 Important difference in the
ubiquitylation process different E3
4- Ubiquitin
- Proteins are usually tagged for selective
destruction in proteolytic complexes called
proteasomes by covalent attachment of ubiquitin,
a small, compact, highly conserved protein.
However, some proteins may be degraded by
proteasomes without ubiquitination. An isopeptide
bond links the terminal carboxyl of ubiquitin to
the e-amino group of a lysine residue of a
"condemned" protein.
5- .Three enzymes are involved, designated E1, E2
E3. - Initially the terminal carboxyl group of
ubiquitin is joined in a thioester bond to a
cysteine residue on Ubiquitin-Activating Enzyme
(E1). This is the ATP-dependent step. - The ubiquitin is then transferred to a sulfhydryl
group on a Ubiquitin-Conjugating Enzyme (E2).
6- A Ubiquitin-Protein Ligase (E3) then promotes
transfer of ubiquitin from E2 to the e-amino
group of a Lys residue of a protein recognized by
that E3, forming an isopeptide bond. - There are many distinct Ubiquitin Ligases with
differing substrate specificity. - One E3 is responsible for the N-end rule.
- Some are specific for particular proteins.
7- More ubiquitins are added to form a chain of
ubiquitins. - The terminal carboxyl of each ubiquitin is linked
to the e-amino group of a lysine residue (Lys29
or Lys48) of the adjacent ubiquitin. - A chain of 4 or more ubiquitins targets proteins
for degradation in proteasomes. (Attachment of a
single ubiquitin to a protein has other
regulatory effects.)
8- Ubiquitin Ligases (E3) mostly consist of two
families - Some Ubiquitin Ligases have a HECT domain
containing a conserved Cys residue that
participates in transfer of activated ubiquitin
from E2 to a target protein. - Some Ubiquitin Ligases contain a RING finger
domain in which Cys His residues are ligands to
2 Zn ions. -
9Model of Ubiquitin Transfer and Ubiquitin Chain
Elongation by HECT Domain Ubiquitin Ligases
HECT
UbcH7
10Evolution of protein structure families
90
Drug design?
70
Biochemistry?
identical
X-ray cristallography MR
50
30
Molecular Biology?
10
identical
Chothia Lesk (1986)
11Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
Sequence Alignment
Structure Evaluation
gthTEII MSSPQAPEDGQGCGDRGDPPGDLRSVLVTTV LNLEPLDEDLF
RGRHYWVPAKRLFGGQIVGQ ALVAAAKSVSEDVHVHSLHCYFVRAGDPK
LP
Structure Modeling
Final Structural Models
12Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
- Protein Data Bank PDB http//www.pdb.org
- Banca Dati dei templati
- Separare in singole catene
- Controllare la qualità delle strutture
Sequence Alignment
Structure Evaluation
Structure Modeling
Final Structural Models
13Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
- Similarità di sequenza / Fold recognition
- Analisi della struttura (risoluzione, metodo
sperimentale - Ci sono altri atomi e/o composti? Sono legati?
Sequence Alignment
Structure Evaluation
Structure Modeling
Final Structural Models
14Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
- Fondamentale per la modellizzazione per omologia.
- Allineamento globale
- Un piccolo errore nellallineamento può essere
fatale per il modello. - Ricordatevi gli allineamenti a coppie
sussurrano, quelli multipli parlano ad alta voce. - Sappiamo qualcosaltro? Ci sono sperimenti?
Sequence Alignment
Structure Evaluation
Structure Modeling
Final Structural Models
15Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
Sequence Alignment
Structure Evaluation
- Assemblaggio di frammenti (Template based
fragment - Assembly - SwissMod).
-
- Minimizzazione della deviazione dai vincoli
spaziali (Satisfaction of Spatial Restraints
MODELLER )
Structure Modeling
Final Structural Models
16Comparative Modeling
Known Structures (templates)
Template(s) selection
Target sequence
- Errori nella selezione dei templati
- Cicli iterativi di allineamento,
modellizzazione e valutazione.
Sequence Alignment
Structure Evaluation
Structure Modeling
Final Structural Models
17X-Ray
Orazio Romeo (master Sardinia)
Ubiquitin ligases (E3) act together with the
ubiquitin activating enzyme (E1) and the
ubiquitin conjugating enzyme (E2) to catalyze
protein ubiquitylation
Used template1nd7 80 ID
18E2 UbcH7 (X-ray)
19Analysis of the interaction surface (molmol) and
3.5 ns MD simulations (NAMD)
20Binding Interface
21Normal modes analysis
22Normal modes analysis Beta-gm program
23Putative hinge regions
24C-lobe
- Two hinge regions found with beta-gm (gaussian
model) - C-lobe
- N-lobe small subdomain
N-lobe
25Beta-GM program
- Normal modes of the complex analysis Vibrational
modes at low frequencies. - Normal modes from a MD simulation tens of
nanosecods. - Beta-gm program implements a coarse-grained model
to describe the dynamics of the protein
(ß-Gaussian network model). - Provides a reliable (by comparison against full
atom MD simulations) description of concerted
large-scale rearrangements in proteins. - The concerted motions are calculated within the
quasi-harmonic approximation of the free energy,
F, around a protein's native state. - A displacement from the native state dRdr1,
dr2,...drn (ri being the displacement of Ca atom
i) is associated with the change in free energy - ?F (½)dRF dR
- Where F is an interaction matrix constructed from
the knowledge of contacting Ca and Cß centroids
in the native state. - The large-scale motions of the system correspond
to the eigenvectors of F having the smallest
nonzero eigenvalues.
26I. Template based fragment assembly
d) Minimizzazione della energia
- Il processo di modeling produrrà contatti
ravvicinati fra atomi, e lunghezze di legame
sfavorevoli. - ? Riuscire ad avere le geometrie giuste
- Minimizzazione della energia troppo estensiva,
può allontanarci dalla vera struttura. - SwissModel utilizza GROMOS 96 force field
27 Eelectrostatic . The electrostatic energy is
evaluated by using the Restrained
Electrostatic Potential (RESP) partial charges.
These charges have the properties of
accurately reproduce the electrostatic potential
multipoles outside the molecule, and they
were calculated in the following way. Ab initio
quantum chemical calculations are performed on
small molecules and the electrostatic potential
j V are calculated on M grid points outside the
molecule.
28II. Modeling by Satisfaction of Spatial restraints
- Derivate per omologia Ottenute dal
allineamento. - Stereochimiche Set di parametri di CHARMM
parameter - MacKerell et al., 1998 ). - Energie di Van der Waals e Coulomb dal campo di
forza CHARMM. - Esterne Vincoli di distanze esterne.
- Trovare la struttura più probabile a
- partire da un allineamento
- Utilizza probability density functions.
- Minimizza deviazioni dai vincoli.
-
Comparative protein modeling by satisfaction
of spatial restraints. A. Šali and T.L. Blundell.
J. Mol. Biol. 234, 779-815
29Cancer Project
- Domenico Raimondo
- Alejandro Giorgetti
30Sjoblom et al.The consensus coding sequences of
human breast and colorectal cancers. Science.
2006
- From this screening our initial set of sequences
consisted of 189 (CAN genes) 122 breast genes
and 69 colorectal ones (two genes overlap), for a
total of 535 peptides. - Steps for initial analysis
- Blast search on PDB (In red Genes for that had
not been strongly suspected to be involved in
cancer). - Blast search on BIND database.
- Semiautomatic modelling (hhpred- toolkit and
visual inspection). - Submission to PMDB.
- Next Annotation of the mutations directly on
PMDB. - Widget for the Cargo web server.
31Encode
Bioinformatica delle proteine 'Proteine al lavoro'
- Modelli sottomessi su PMDB 30 (20 a 97 ID
seq. e copertura totale). - Sequenze con struttura risolta 25 (sempre meno
di 50 aa mancanti). - Mappaggio domini-esoni. Solo 3 strutture trovano
corrispondenza a meno di 5 aa. - Modelli di varianti di splicing 70
- 47 sequenze e le sue varianti si splicing sono
state analizzate.16 non hanno splicing
alternativo, 11 hanno splicing show alternativo
nelle regioni non codificanti, e 20 hanno
varianti di splicing (in generale 2-3).
32- Copertura PARZIALE (41 trascritti)
- Meno di 50 (30) aa (al 5')
- 11 con informazione di struttura
- 4 hanno varianti di splicing
- 2 sequenze identiche
- 1 esone interno mancante (non
sembra una struttura possibile) - 1 esone interno mancante (fuori
copertura X-ray) - Più di 50 aa (150- 300) mancanti
- 8 con informazione di struttura
- 6 hanno varianti di splicing
- 2 sequenze identiche
- 4 esoni interni mancanti (fuori
copertura X-ray) - Copertura TOTALE (39 trascritti)
- 22 hanno varianti di splicing (2 a 4)
- 5 sequenze identiche ()
- 2 esoni alternativi al 5'
- 7 esoni alternativi interni
33Encode
Bioinformatica delle proteine 'Proteine al lavoro'
AC004039.4 - 001 -002
AC069356.1 - 001 -002
34Encode Risultati
35Discussione
- Unlike most evolutionarily related sequences the
splice isoforms in this set are sequence
identical except for single deletions or
insertions Many of these are relatively large. - Cambiamenti al C-terminal e al N-terminale
tendono ad essere swaps. Cambiamenti interni
delezioni. - In 73 (22) casi le strutture PDB avranno delle
modifiche dovute alle inserzioni o delezione
nelle varianti di splicing. - In 49 (19) casi ci deve essere un grosso
refolding - In 24 (3) casi gli effetti nella struttura
dovranno essere piccoli. - 994 sequenze hanno un dominio PFAM.
- 42.5 (423 sequenze) hanno un dominio PFAM che è
diviso in due. - 53 isoforme hanno due domini interrotti broken
domains - 3 sequenze 3 domini sono state splited have
been split.