Posttranslational modifications - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Posttranslational modifications

Description:

... Christophe, Christopher, Christos, Claude, Claudia x2, Claudine, Colin, Colombe, ... Maria, Maria Esperanza, Maria-Jesus, Marie-Claude, Marilyn, Marisa, Mark x2, ... – PowerPoint PPT presentation

Number of Views:370
Avg rating:3.0/5.0
Slides: 54
Provided by: bai64
Category:

less

Transcript and Presenter's Notes

Title: Posttranslational modifications


1
Welcome to Fortaleza
2
First a big welcome to the speakers
  • We have a wonderful cast of speakers coming from
    all over the world
  • Many of them have played, directly or indirectly,
    an important role in the history of Swiss-Prot
  • In the name of the organizing committee I thank
    all of them for having accepted to participate to
    this anniversary meeting.

3
Then a big welcome to the UniProters
  • Since 2000, every two years, the members of the
    Swiss-Prot groups at SIB and EBI have attended a
    retreat to discuss various aspects of their
    collaborations
  • This meeting doubles up as a very special
    retreat. It is an opportunity for attendees and
    speakers to tell us what they think we should be
    doing!

4
Welcome and thank you to all attendees
  • For those coming from all over the world we know
    it was not easy to come to Brazil
  • We hope that this meeting will be an opportunity
    for you to listen to interesting talks
  • But more importantly meetings are essential to
    network and to start or pursue collaborations
  • So please enjoy and make the most of these four
    days that we will spend together.

5
And finally, but no the least! a big thank you
to all the sponsors
6
A few important last minute informations and
reminders
  • In the conference bag external pocket you will
    find many important things including
  • The pocket guide (the program at a glance)
  • The instructions to access the Wifi Internet
    wireless service
  • The vote bulletin for the best poster award
  • Information about a survey concerning
    UniProtKB/Swiss-Prot.

7
The Swiss-Prot survey
  • The Swiss-Prot annotators that are carrying out
    the survey have a small red sticker on their name
    badge
  • The persons that will have answered the survey
    will receive a small yellow sticker to put on
    their name badge so that they do not get asked to
    participate over and over again!

8
Protein Spotlight book
  • In your bag you will find a copy of Tales from a
    small world
  • It is a book containing all the Protein Spotlight
    articles published since 2000
  • We can offer a copy of this book to all of you
    thanks to Current Biodata who fully sponsored the
    cost of its printing.

9
Program changes
  • Due to flight problems (Varig!) we lost 3
    speakers Terri Attwood, Philipp Bucher and
    Minoru Kanehisa
  • We will use 2 of the 3 slots for different
    tutorials on Swiss-Prot, the third slot will be
    used to get to the beach an hour earlier on
    tuesday!
  • Nasri Nahas talk will be given by Ron Appel as
    Nasri is busy trying to get his family out of
    Lebanon (last minute they are safely back in
    Geneva)
  • Vitek Tracz talk will be given by Matthew
    Cockerill who has overall responsibility for
    BioMed central
  • Really last minute we just learned Gunnar fell
    sick on the way here and has turned back to
    Sweden.

10
Speakers
  • Try to end your talk 5 minutes before the alloted
    time slot so as to leave the opportunity for a
    few questions
  • There is for each day of the conference a
    Swiss-Prot team member who is responsible for
    making sure we are on time and to moderate the
    question session
  • You will have in front of you a digital timer
    that will show you how much time is left
  • Once your time is over, it will ring and the
    moderator will make his best efforts to expel you
    from the podium!

11
Speakers - 2
  • You can use the podium microphones and this will
    ensure that your image is captured correctly on
    the camera, but you can also use a wireless lapel
    mike
  • Please use the mouse to point on objects in your
    presentation instead of the laser pointer.

12
The SwissProt Song Genome annotators, with your
big machines If you didn't have Swiss-Prot, You
wouldn't find a thing - with your big machines
- It you didn't have Swiss-Prot you would not
find a thing Ain't no good the software the grid
and the middleware If it'not for Swiss-Prot You
wouldn't get nowhere - with your middleware - If
it's not for Swiss-Prot You would not get nowhere
13
"Plus ça change, plus c'est la même chose the
next 20 years
14
the (pre)-history of Swiss-Prot
This will not be a talk on
15
The universe in which Swiss-Prot evolves
1953 1st sequence (bovine insulin) 1986 4000
sequences 2006 3.5 million sequences
Where will it stop?
179'000'025'042 (179 billion)
16
179'000'025'042
1st estimate 30 million species (1.5 million
named)
2nd estimate 20 million bacteria/archea
x 4'000 genes 5 million
protists x 6'000
genes 3 million insects
x 14'000 genes 1 million fungi
x 6'000 genes 0.6
million plants x
20'000 genes 0.2 million molluscs, worms,
arachnids, etc. x 20'000 genes 0.2 million
vertebrates x 25'000
genes
The calculation 2x107x40005x106x60003x106x14000
106x60006x105x200002x105x200002x105x25000
25000(Craig Venter)42(Douglas Adam)
Caveat this is an estimate of the number of
potential sequence entries, but not that of the
number of distinct protein entities in the
biosphere.
17
When will UniProtKB be complete?
  • Swiss-Prot
  • In July 2009 500000 entries
  • In 2013 1 million entries
  • In 2026 (40th anniversary) 10 million entries
  • In 2036 (50th anniversary) 100 million entries.
  • TrEMBL
  • In May 2080 TrEMBL will have reached 10 billion
    entries
  • We cant compute with Excel when we will reach
    179 billion entries
  • But we are confident these dates are worthless as
    new sequencing techniques will have made all of
    these projections a very futile exercise!

18
Sequences
  • The bread of Swiss-Prot. And yes annotations are
    the butter!
  • gt99 of the protein sequences originate from
    translation of mRNA or genomic sequences
  • Do we still need manual intervention to cater for
    sequences or can we just build smart filters to
    obtain those we want from TrEMBL?

19
So what is the current status?
  • A snapshot of the situation
  • 28200 entries with 82000 sequence conflicts
  • 2600 entries with corrected frameshifts
  • 15100 entries with corrected initiation sites
  • 4300 entries with other sequence problems.
  • At least 43000 entries (19 of Swiss-Prot)
    required a minimal amount of curation effort so
    as to obtain the correct sequence.

20
Quality of protein information from genome
projects
  • Lets look at proteins originating from 3
    different genome projects
  • Drosophila the example of what a curated (thanks
    to FlyBase) genome effort should look like only
    1.8 of the gene models conflict with what we
    have in Swiss-Prot
  • Arabidopsis a typical example of a genome where
    lots of work was spent to annotate it at the time
    where it was sequenced, but where nothing as been
    done since (at least in the public view) 19.5
    of the gene models are erroneous
  • Tetraodon nigroviridis the typical example of a
    quick and dirty automatic run through a genome
    with no manual intervention gt90 of the gene
    models produce incorrect proteins.

21
Human sequence entries as an example
  • We have about 14500 human entries in Swiss-Prot
  • 4300 entries contain information about 8000
    splice variants
  • 4600 entries contain information about 27000
    sequence variants
  • 7500 entries contain information about 22000
    sequence conflicts
  • In average each human entry is produced by
    merging together sequence information from 6.2
    different nucleotide sequence entries.

22
Take home message
  • Producing a clean set of sequences is not a
    trivial task
  • It is not getting easier as more and more type of
    sequence data gets submitted
  • It is important to pursue our efforts in making
    sure we provide to our users the most correct set
    of sequences for a given organism.

23
Post-translational modifications (PTMs)
  • If sequences are important, their are generally
    not fully representative of the final biological
    entity most proteins are the target of PTMs
  • PTMs are important at various levels, including
    the 3D structure, interactions, subcellular
    location and also the function
  • The story of the integration of PTMs in
    Swiss-Prot consists of 3 distinct parts
  • 1st part a long time ago in a distant
    proteogalaxy

FT MOD_RES 86 86
GAMMA-CARBOXYGLUTAMIC ACID. FT MOD_RES 110
110 HYDROXYLATION. FT CARBOHYD 203
203 POSSIBLE.
24
The 2nd phase 2000 to 2005
  • Complete overhaul and significant extension of a
    controlled vocabulary for PTMs
  • Creation of a PTM annotation program within the
    Swiss-Prot groups at SIB and EBI
  • Development of new tools (Sulfinator, DGPI) for
    the prediction of some PTMs
  • Massive clean up and re-annotation of many
    classes of PTMs.

25
The expanding world of PTMs
  • We now have 283 different PTM descriptions
    (excluding processing, disulfide bonds and
    glycosylation events).

26
The new document listing post-translational
modifications
Contains many information items and is available
in html format or by ftp in tab-delimited format.
27
Finally LSEs for PTMs!
  • Finally Proteoman has arrived! And PTM
    information can now be obtained from results of
    proteomics large scale experiments (LSE)
  • In the past 12 months we have added about 6000
    experimental PTMs using data originating from
    some of these projects.

28
But LSEs are not so easy to deal with
  • Issues mundane to the incorporation of LSE PTM
    data
  • Quality
  • Trying to assess whether the methodology really
    allows the detection of in- vivo modifications
  • How many false positives are expected (often
    absent or very well hidden!)
  • Accessing the data
  • Often in supplementary material tables and in a
    variety of formats (HTML tables, excel
    spreadsheets, etc.)
  • With a variety of identifiers (UniPRotKB, NCBI
    gi, pID, etc.)
  • Sanity checking
  • Making sure that the right sequence position is
    modified
  • Does it make sense in the biological context
  • Propagating the information to orthologs.
  • So the big issue is how will we be able to scale
    up and deal with the expected increase in the
    number of such projects!

29
Cross-references then
  • The DR lines were introduced in release 4 in
    April 1987 they first linked Swiss-Prot to EMBL,
    PDB and PIR
  • They were instrumental in the development of SRS
    by Thure Etzold in the early 90s
  • And also for ExPASy, the first web server in the
    life sciences in 1993.

30
Organism-specific gene databases AGD DictyBase Ech
oBASE EcoGene FlyBase GeneDB_Spombe GeneFarm Grame
ne HGNC H-InvDB HIV LegioList Leproma ListiList M
aizeDB MGI MIM MypuList PhotoList RGD SagaList SGD
StyGene SubtiList TAIR TubercuList WormBase WormP
ep ZFIN
Family and domain databases Gene3D HAMAP InterPro
PANTHER PIRSF Pfam PRINTS ProDom PROSITE SMART TIG
RFAMs
Enzyme and pathway databases BioCyc Reactome
Sequence databases EMBL PIR UniGene
2D-gel databases ANU-2DPAGE Aarhus/Ghent-2DPAGE C
OMPLUYEAST-2DPAGE ECO2DBASE HSC-2DPAGE OGP PHCI-2D
PAGE PMMA-2DPAGE Rat-heart-2DPAGE Siena-2DPAGE SWI
SS-2DPAGE
UniProtKB/Swiss-Prot explicit links
Protein family/group databases PptaseDB GermOnline
MEROPS REBASE TRANSFAC
Miscellaneous dbSNP GO IntAct LinkHub RZPD-ProtExp

3D structure databases HSSP PDB SMR
Genome annotation databases Ensembl GenomeReviews
TIGR
PTM databases GlycoSuiteDB PhosSite
31
Cross-references now
  • There are now cross-references from Swiss-Prot to
    74 different databases (6 more are in the
    pipeline)
  • Almost 3 million DR lines an average of 12 per
    entry
  • Many other links to external resources are also
    available through the OX (NCBI taxonomy), RX
    (PubMed, DOI), CC (Web resource topic) and FT
    lines (dbSNP)
  • Cross-references are not only a mean to help
    navigate between resources, they sometimes add
    information to the entries.

32
Examples of cross-references that provide
information
  • The cross-references to the Gene Ontology (GO)
  • DR GO GO0005634 Cnucleus ISS.
  • DR GO GO0005515 Fprotein binding IPI.
  • DR GO GO0007165 Psignal transduction TAS.
  • The PDB cross-references include information on
    the mapping of the structure on the sequence
  • DR PDB 1QQG X-ray A/B4-267.
  • The cross-references to domain databases include
    information on the name/acronyms of the domains
    and the number of occurrences of these domains
  • DR PROSITE PS50026 EGF_3 2.
  • DR PROSITE PS50092 TSP1 3.
  • DR PROSITE PS01208 VWFC_1 1.

33
From sequences to structures..and back!
  • Efficient bidirectional links between UniProtKB
    and PDB/MSD are very important
  • Currently 10000 Swiss-Prot entries are linked to
    30200 PDB entries
  • These links are constantly updated and verified
    the converse is unfortunatly still not yet true
  • We have always made use of 3D structure
    information to help in the annotation process
  • But we are only now starting to systematically
    mine 3D structures to extract various information
    such as disulfide bonds, metal-binding sites,
    active sites, etc.

34
So what is the future of cross-references?
  • Will we really need hard-coded cross-references
    in the future?
  • Can we gradually replace some of them by computed
    on the fly links using referenceable objects?
  • Will we make more use of client-server systems
    such as the distributed annotation system (DAS)?
  • The answer is obviously dependent on
    standardization
  • But the Life Sciences are still living in the
    dark ages of the tower of Babel

35
CVs and ontologies
  • Since the very beginning of Swiss-Prot we have
    been building a growing sets of controlled
    vocabularies (ontologies)
  • Species, strains, plasmids, journals, tissues,
    PTMs domain names and, of course, keywords are
    all under control (see posters SP117 and
    SP120)
  • We are very well advanced in the process of
    having a CV for pathways (see the UniPathway
    poster SP140)
  • We are now tackling the problems of protein and
    gene names (see poster SP118). But this is of
    course not very easy!

36
Do we need annotations?
  • Annotators spend a big part of their time
    capturing and synthesizing a huge amount of
    functional information
  • For example we populate Swiss-Prot with data
    relevant to the
  • Role and function of the proteins
  • Subcellular location
  • Interactions (binary and complex)
  • Tissue specificity, developmental stage
  • Involvement in diseases.
  • We have many anecdotal evidence that users find
    this very important and that this is one of the
    important hallmark of Swiss-Prot. Yet is this
    really true?

37
Do we need annotations? part 2
  • This is a time consuming process and we will
    never be complete and up-to-date
  • Many users want quick and easy to summarize
    answers, yet the more detailed an entry becomes
    the less it is easy to transform it into a
    summarizable entity
  • We are often the victims of the fasta format
    syndrome users expect everything important
    about a protein to be available in the header of
    a fasta format entry!
  • So should we continue?

38
Yes we need annotation!
  • Because (among many other reasons)
  • Automatized annotation is the only way to
    transfer knowledge from a model organism to a
    less studied one
  • To apply such techniques safely one needs
    template entries that are representative of the
    state of the knowledge
  • While literature mining tools could be conceived
    as a way to automatically build a summary view of
    the knowledge around a given protein, these
    techniques are not yet powerful enough to create
    a coherent synthetic view
  • Literature mining tools also require the
    existence of well annotated (corpus) entries.

39
From pull to push..
  • For now more than 20 years we have been pulling
    information and knowledge from various sources,
    but mainly from literature
  • It is now time to make sure that the next 20
    years will be defined by the fact that
    researchers push their results and the
    interpretation of their results in the
    knowledgebase.

40
Adopt a protein
  • Attempt to try to get the community to directly
    submit information on the proteins that they are
    studying
  • Using a wikepedia-type model/interface
  • Will first be field-tested in the yeast
    community
  • We are hopeful, yet we are realist only a small
    percentage of life researchers will take the time
    and are altruistic enough to fully participate in
    such a scheme.

41
Grey grey matter counts!
  • Many life scientists with knowledge of the
    molecular world and that are computer-proficient
    are reaching retirement age
  • Some want to continue to play a role in the
    advancement of research, yet they will not be
    able to do lab work anymore
  • We should offer them the tools necessary for them
    to contribute to the annotation process.

42
Anabelle and Asterix
  • Two important tools could contribute to the
    democratization of Swiss-Prot style annotation
  • Anabelle a web based protein sequence analysis
    platform
  • Asterix the new Swiss-Prot editor.

43
Anabelle selection module
Viewer Layout
Link to entry NiceProt view
Blast (full) entry
more links!...
Links
Link to InterPro
Link to domain original database
Link to most similar entry NiceProt view
Align most similar entry with entry
Blast uncharted region
44
And here is what the users gets back
45
But what about the rest of the life scientists?
  • We saw how we could get parents (adopt a protein)
    and grand parents (grey matter count)
    involvements, but what about the children..
  • the young researchers, those who are active in
    producing new knowledge?

46
Two carrots, a stick and lots of education!
  • The carrots
  • Making sure that granting agencies see favorably
    the involvement of researchers in the process of
    submitting information to databases
  • The same criteria should be considered by any
    hiring or promotion committee
  • The stick getting journal editors to refuse to
    accept to publish a paper if the results have not
    been submitted to the relevant knowledge
    resources

47
Education!
  • Everyone should feel concerned
  • Awareness of the content and usage of knowledge
    resources is a pre-requisite to do any type of
     serious  research in the field of molecular
    life sciences
  • Organizations such as EMBNet, EBI, SIB, NCBI, NIG
    should continue and strenghten their outreach
    efforts
  • We (databases providers) should do more in term
    of providing tutorials (on-line and on-site).

48
An important issue
  • The process of developing a data resource for the
    Life Sciences is akin to the work of middle age
    copists, renaissance encyclopedists or the 19th
    century OED development it is a very tedious,
    manually intensive, long term job

49
How to get funding for knowledge infrastructures
in the life sciences?
  • Funding knowledge resources is difficult
  • Its a very long term process
  • Its not prestigious
  • and its not cheap!

50
And its not only databases that are endangered!
Service groups are also at risk
51
Proposition for a new tax
  • Each grant proposal for a high throughput
    data-producing project would be obliged to set
    aside a predefined percentage of the grant money
    to help cover the cost of storing and managing
    the produced data
  • How this money would be redistributed is not
    trivial to define and even less to implement
  • The priority would be to use this tax as a
    financial tool to help fund the data repositories.

52
The 6 observations of a  databaser 
  • Your task will be much more complex and far
    bigger that you ever thought it could be
  • If your database is successful and useful to the
    user community, then you will have to dedicate
    all your efforts to develop it for a much longer
    period of time than you would have thought
    possible
  • You will always wonder why life scientists abhor
    complying with nomenclature guidelines or
    standardization efforts that would simplify your
    and their life
  • You will have to continually fight to obtain a
    minimal amount of funding
  • As with any service efforts, you will be told far
    more what you do wrong rather than what you do
    right
  • But when you will see how useful your efforts are
    to your users, all the above drawbacks will loose
    their importance!!

53
Thank You
Aiala, Alain x4, Alan x4, Alastair, Alex x2,
Alexander x2, Alexandre x2, Alice, Alistair,
Allyson, Alvis, Amanda, Ana Tereza, Anastasia,
Andre x3, Andrea, Andreas, Andrew, Angela, Anne
x4, Anne-Lise, Anthony, Antoine, Anulka, Arnaud
x2, Arthur, Astrid, Athel, Barbara x2, Barend,
Baris, Barry, Bart, Bastien, Bengt, Bernard x2,
Bernd, Bernhard x2, Bill, Bob, Brigitte, Bruno
x2, Burkhard, Carl, Carola, Carolyn, Catherine
x4, Cathy x2, Cecile x2, Cecilia, Cedric, Cesare,
Chantal x3, Charles x2, Chris, Chrissie,
Christian x3, Christiane, Christine x2,
Christoph, Christophe, Christopher, Christos,
Claude, Claudia x2, Claudine, Colin, Colombe,
Corinne, Cristiano, Damien, Dan, Dana, Daniel x3,
Daniela, Danielle, Darcy, Darren, Dave x2, David
x5, Delphine, Denis x2, Dennis, Des, Dietmar,
Dolnide, Dominique, Doron, Dorothy, Doug, Duncan,
Eddie, Edgar, Edouard, Eleanor, Elisabeth x2,
Elmar, Elvis, Emily, Emmanuel, Eric x3, Erik,
Ernest, Ernst, Esther, Eugene x2, Eva, Eve,
Evelyn, Evgenia, Evgeny, Ewan, Fabrice, Fiona,
Flavio, Florence x3, Fotis, Francis, Frank,
François x3, Frederic, Frederique x2, Gabriel,
Gabriella, Ganesh, Gaston, Geoff, Gerry, Gert,
Ghislaine, Gilbert, Gill, Goran, Gottfried,
Graham x2, Greg, Gregoire, Guido, Guillaume,
Gunnar, Guy x2, Guy-Olivier, Hanah, Heidi,
Henning, Hien, Hilde, Holger, Hongzhan, Howard,
Hsing-Kuo, Ian, Iirit, Ilkka, Ioannis, Irving,
Isabelle x2, Ivan x2, Ivo, Jack, Jacques x2,
Jaime, Janet x2, Jean-Charles, Jean-François,
Jean-Jacques, Jean-Michel, Jean-Pierre x2,
Jeffrey, Jenny, Jerome, Jim, Jingchu, Joachim,
Joanna, Joel, John x7, Jonas, Jonathan x2, Jorja,
Jos, Juan, Juergen, Julia, Julio, Julius, Kai,
Karin, Karine, Kate, Kati, Katja, Katsumi, Kay,
Keiichi, Keith x3, Ken x2, Kenta, Khaled, Kirill,
Kirsty, Kristian, Larry, Laure, Laurent x3, Lee,
Leigh, Leon, Li, Lina, Lionel, Lisa x2, Livia,
Lorenza, Lorenzo, Louise, Luca, Luciane, Lucien,
Luisa x2, Luiz, Lydie x2, Ma'ayan, Madelaine,
Maggie, Mahesh, Manolo, Manuel x2, Manuela, Marc
x6, Marcia, Marco, Margaret x2, Mari Trini,
Maria, Maria Esperanza, Maria-Jesus,
Marie-Claude, Marilyn, Marisa, Mark x2, Martin
x2, Martine, Marvin, Mary, Massimo, Matteo,
Matthew, Mauricio, Michael x7, Michel x3,
Michele, Michelle, Miguel, Mike x2, Minna,
Minoru, Monica, Monika, Morido, Nabil, Nadeem,
Nadine x2, Naruya, Nasri, Natalia, Nathalie, Neil
x2, Nicky, Nicola, Nicolas x3, Nicole x3,
Nicoletta, Nicolle, Nikos, Nina, Oliver, Olivier
x4, Orna, Owen, Paolo, Pascal, Pat, Patricia x6,
Patrick x5, Paul, Paula, Pavel, Pedro, Peer,
Peter x7, Petra, Phil x2, Philip, Philippe x3,
Pierre, Pierre-Alain, Pieter, Piotr, Rachael,
Raffaella, Rainer, Raja, Rasko, Raton laveur,
Rebecca x2, Rein, Reinhard x2, Remi, Reto,
Reynaldo, Rich, Richard, Robert x2, Roberto,
Robin, Rodger, Rodrigo, Roland, Ron, Rosita,
Ross, Roy, Russ x2, Ruth x3, Saeid, Salvo, Samia,
Samuel x2, Sandor, Sandra x2, Sandrine, Sarah,
Scott, Sebastien x2, Serenella, Sergio, Severine
x2, Shigehaki, Shmuel, Shoko, Shoshana, Shyamala,
Silvia x2, Sineaid, Siv, Sona, Soren, Sorogini,
Steffen x2, Steffi, Stephanie x2, Steve, Steven,
Stuart x2, Stylianos, Sunil, Sylvain, Sylvie x2,
Takashi, Tamara, Tammera, Tania x2, Temple,
Terri, Terry, Thomas x3, Thure, Tim x2, Timothy,
Toby, Tom, Toni, Torsten, Ujwal, Ulrich, Ursula,
Valeria, Vassilios, Veronique, Vicente, Victor
x2, Vincent, Vinnei, Violaine, Virginie x2,
Vitaliano, Vitek, Vivien x2, Vivienne, Wanessa,
Wei mun, Weimin, William, Williams, Willy,
Winona, Winston, Witek, Wolfgang, Xavier x2,
Yasmin, Yasuhiro, Yongxing, Yoshio, Youla,
Young-Ki, Zeev, Zhang-Zhi.
Write a Comment
User Comments (0)
About PowerShow.com