Title: Web Services for N-Glycosylation Process
1Web Services for N-Glycosylation Process
Satya S. Sahoo, Amit P. Sheth, William S. York,
John A. Miller
Presentation at International Symposium on Web
Services For Computational Biology and
Bioinformatics, VBI, Blacksburg, VA, May 26-27,
2005
Integrated Technology Resource for Biomedical
Glycomics NCRR/NIH
2Glycomics
- Study of structure, function and quantity of
complex carbohydrate synthesized by an organism - Carbohydrates added to basic protein structure -
Glycosylation
Folded protein structure (schematic)
3Glycosylation why is it important?
- Genome (comprised of DNA) or Proteome (proteins)
are not the only factors in life functions of an
organism - Carbohydrates attached to different protein
structures (by glycosylation) are important for - Identification of foreign entities by immune
system cells - Markers to accurately diagnose diseases
- Regulate signaling activities
- Categorization of glycosylation - the way
carbohydrates are attached to proteins. Example
N-glycosylation
4N-Glycosylation Process (NGP)
Cell Culture
By N-glycosylation Process, we mean the
identification and quantification of glycopeptides
extract
Glycoprotein Fraction
proteolysis
Glycopeptides Fraction
1
Separation technique I
n
Glycopeptides Fraction
PNGase
n
Peptide Fraction
Separation technique II
nm
Peptide Fraction
Mass spectrometry
ms data
ms/ms data
Data reduction
Data reduction
ms peaklist
ms/ms peaklist
binning
Peptide identification
Glycopeptide identification and quantification
Peptide list
N-dimensional array
Data correlation
Signal integration
5NGP part of the Bioinformatics coreIntegrated
Technology Resource for Biomedical Glycomics
- This Resource was established by the National
Center for Research Resources - The aim is to develop the tools and technology to
analyze glycoprotein and glycolipid expression of
embryonic stem cells - Our research provides bioinformatics support for
four research groups - Embryonic Stem Cell Culture Program
- Glycomic Analysis of Glycoproteins
- Glycomic Analyses of Glycosphingolipids and
Sphingolipids - Transcript analysis by kinetic RT-PCR
6NGP need in Glycomics
- Unlike proteomics or genomics, high-throughput
experimental protocols are still being
established in Glycomics - NGP involves a multitude of heterogeneous tasks,
including human-mediated tasks - NGP attempts to encapsulate particular
computational steps as platform-independent,
scalable and Web-accessible tools Web Services - Enables glycobiologists to integrate automated
data generation tasks with data processing tools
(Web Services) end-to-end
experimental lifecycle
7N-Glycosylation identification - Problems
- Extremely difficult to identify glycosylated
peptide sequences using standard analytical
methods - N-glycosylation occurs at particular sites on the
protein structure consensus sequences
Asparagine
Aspartate
Consensus Sequence
Peptide
X
S/T
N
D
J
PNGaseF
Glycan
An example glycopeptide (schematic)
8NGP - implementation
- NGP,currently,implements a Web Process
constituted of two Web Services - DB Modifier Web Service modifies the search
database by replacing N (in consensus sequences)
by J - Collator Web Service identifies a probable
N-glycosylated peptide, using three parameters - Calculated molecular mass
- Presence of J in a peptide sequence
- MASCOT Score assigned to a hit
- NGP also involves propriety Mass Spectrometer
search engine service (MASCOT) as an
intermediate task - Hence, NGP Web Process identifies probable
glycosylated peptides enabling rapid processing
of data from high throughput experiment
http//www.matrixscience.com/
9NGP Architecture (current)
PEAK LIST FILE
ms/ms raw data
Primary Sequence Database
ModifyDB Web Service
MASCOT Mass Spectrometer Search Engine
Collator Web Service
MASCOT output file (contains both glycosylated
and non-glycosylated peptide sequences)
Deglycosylated peptide list
http//www.matrixscience.com/
10NGP Results
q1_p1-1 q2_p10,626.349945,-0.023321,2,APGVAGR,18
,000000000,1.49,00020000000000000,0,0"gi51465537
"01901961 q2_p21,626.361191,-0.034567,2,APARG
R,18,00000000,1.33,00020000000000000,0,0"gi10140
845"0272 q2_p30,626.349945,-0.023321,2,APAVGG
R,18,000000000,1.33,00020000000000000,0,0"gi5147
0766"02122181,"gi51470768"02122181 q3_p3
0,634.368973,0.006151,4,DIIFK,12,0000000,25.26,000
10020000000000,0,0"gi47078238"03643682,"gi4
7078240"03283322 q3_p40,634.351227,0.023897,4
,MPLFK,12,0000000,25.24,00010020000000000,0,0"gi
41197108"095991,"gi4557311"0152 q3_p50,6
34.343811,0.031313,3,NNLFK,12,0000000,15.34,000100
20000000000,0,0"gi31377725"05395431 q3_p60,
634.368973,0.006151,3,LDIFK,12,0000000,15.34,00010
020000000000,0,0"gi39725634"08918951 q3_p70
,634.343811,0.031313,3,NNIFK,12,0000000,15.34,0001
0020000000000,0,0"gi7661646"02122161 q3_p80
,634.368973,0.006151,3,LDLFK,12,0000000,15.34,0001
0020000000000,0,0"gi51474898"02372411 q3_p9
0,634.368958,0.006166,3,EVIFK,12,0000000,13.61,000
10020000000000,0,0"gi28376662"067711 q3_p10
0,634.368958,0.006166,3,VELFK,12,0000000,13.61,000
10020000000000,0,0"gi51467300"04934971,"gi5
1467535"0991031 q4_p1-1 q5_p10,662.375122,0.
004702,5,DLLFR,14,0000000,18.41,00020020000000000,
0,0"gi21536369"084881,"gi21536367"01721
1,"gi4557871"06476511 q5_p20,662.375122,0.00
4702,3,DLFLR,14,0000000,12.81,00010020000000000,0,
0"gi33695153"04074111,"gi4504043"0330334
1,"gi11968045"06101 q5_p30,662.375122,0.004
702,3,DIFIR,14,0000000,12.81,00010020000000000,0,0
"gi4505725"09249281,"gi29788751"01170117
41 q5_p40,662.349960,0.029864,3,NNFIR,14,0000000
,11.84,00010020000000000,0,0"gi24416002"06676
711 q5_p50,662.375122,0.004702,4,IDLFR,14,000000
0,9.98,00020020000000000,0,0"gi12957488"06026
061,"gi41148707"05365401,"gi51464463"0646
6501 q5_p60,662.375122,0.004702,4,LDLFR,14,0000
000,9.98,00020020000000000,0,0"gi42657517"0335
3391 q5_p70,662.375107,0.004717,4,VELFR,14,0000
000,9.98,00020020000000000,0,0"gi6912230"0436
4401 q5_p80,662.375122,0.004702,4,LDIFR,14,00000
00,9.98,00020020000000000,0,0"gi8922081"02699
27031 q5_p90,662.349960,0.029864,4,NLNFR,64,0000
000,5.89,00010020000000000,0,0"gi19923416"0816
8201 q5_p101,662.361191,0.018633,2,NRFAR,14,000
0000,3.37,00010020000000000,0,0"gi4758704"097
1011 q6_p10,674.359863,-0.006639,4,VSDNIK,35,000
00000,11.27,00010020000000000,0,0"gi32130516"0
9359401 q6_p20,674.323456,0.029768,5,EGDLGGK,21
,000000000,7.97,00020020000000000,0,0"gi13569928
"0105810641 q6_p30,674.359848,-0.006624,5,EAT
VAGK,21,000000000,7.88,00020020000000000,0,0"gi5
1475822"05275331 q6_p41,674.389740,-0.036516,
3,QRMLK,14,0000000,7.46,00020010000000000,0,0"gi
24307905"04674712,"gi24307905"06386422 q6
_p50,674.359863,-0.006639,5,LSSSPGK,56,000000000,
7.38,00000020000000000,0,0"gi8922075"0806812
1 q6_p60,674.338730,0.014494,4,WDLGGK,42,00000000
,6.40,00010020000000000,0,0"gi13375817"012312
81 q6_p70,674.359879,-0.006655,4,QATDLK,56,00000
000,6.21,00020010000000000,0,0"gi21361684"0451
4561 q6_p81,674.371094,-0.017870,3,QTNKGK,14,00
000000,6.03,00020010000000000,0,0"gi41117716"0
85901 q6_p91,674.389740,-0.036516,6,QMRIK,28,00
00000,5.77,00020020000000000,0,0"gi28329439"02
692731,"gi28558993"02782821 q6_p101,674.38
9740,-0.036516,6,QMRLK,28,0000000,5.77,00020020000
000000,0,0"gi40255096"03003041 q7_p10,695.3
48969,0.007855,4,YDASLK,14,00000000,8.86,000200200
00000000,0,0"gi4758454"0276127661
- A typical MASCOT output file is about 3MB!
- High-throughput experiment protocol generate
thousands of such files - manual identification
is not feasible
11NGP Web Services Adding Semantics
- Two Ontologies developed as part of the
NCRR-Glycomics project - GlycO a domain Ontology embodying knowledge of
the structure and metabolisms of glycans - Contains 770 classes describe structural
features of glycans - URL http//lsdis.cs.uga.edu/projects/glycomics/gl
yco - ProPreO a comprehensive process Ontology
modeling experimental proteomics - Contains 296 classes
- Models three phases of experimental proteomics
Separation techniques, Analytical techniques and,
Data analysis - URL http//lsdis.cs.uga.edu/projects/glycomics/pr
opreo
http//pedro.man.ac.uk/uml.html (PEDRO UML
schema)
12ProPreO - Experimental Proteomics Process Ontology
- ProPreO models the phases of proteomics
experiment using five fundamental concepts - Data (Example a peaklist file from ms/ms raw
data) - Data_processing_applications (Example MASCOT
search engine) - Hardware embodies instrument types used in
proteomics (Example ABI_Voyager_DE_Pro_MALDI_TOF)
- Parameter_list describes the different types of
parameter lists associated with experimental
phases - Task (Example component separation, used in
chromatography)
http//www.matrixscience.com/
13Service description using WSDL-S
- Formalize description and classification of Web
Services using ProPreO concepts
lt?xml version"1.0" encoding"UTF-8"?gt ltwsdldefin
itions targetNamespace"urnngp"
.. xmlnsxsd"http//www.w3.org/2001/XMLSchema"gt
ltwsdltypesgt ltschema targetNamespace"urnngp
xmlns"http//www.w3.org/2001/XMLSchema"gt
.. lt/complexTypegt lt/schemagt lt/wsdltypesgt
ltwsdlmessage name"replaceCharacterRequest"gt
ltwsdlpart name"in0" type"soapencstring"/gt
ltwsdlpart name"in1" type"soapencstring"/
gt ltwsdlpart name"in2" type"soapencstring
"/gt lt/wsdlmessagegt ltwsdlmessage
name"replaceCharacterResponse"gt ltwsdlpart
name"replaceCharacterReturn" type"soapencstring
"/gt lt/wsdlmessagegt
lt?xml version"1.0" encoding"UTF-8"?gt ltwsdldefin
itions targetNamespace"urnngp"
xmlns wssem"http//www.ibm.com/xmlns/WebServ
ices/WSSemantics" xmlns ProPreO"http//lsdis.cs
.uga.edu/ontologies/ProPreO.owl" gt
ltwsdltypesgt ltschema targetNamespace"urnngp"
xmlns"http//www.w3.org/2001/XMLSchema"gt
lt/complexTypegt lt/schemagt lt/wsdltypesgt
ltwsdlmessage name"replaceCharacterRequest"
wssemmodelReference"ProPreOpeptide_sequence"gt
ltwsdlpart name"in0" type"soapencstring"/
gt ltwsdlpart name"in1" type"soapencstring
"/gt ltwsdlpart name"in2"
type"soapencstring"/gt lt/wsdlmessagegt
Description of a Web Service using Web Service D
escription Language
data
sequence
peptide_sequence
Concepts defined in process Ontology
ProPreO process Ontology
WSDL ModifyDB
WSDL-S ModifyDB
14Biological UDDI (BUDDI) WS Registry for
Proteomics and Glycomics
- There are no current registries that use semantic
classification of Web Services in glycoproteomics - BUDDI classification based on proteomics and
glycomics classification part of integrated
glycoproteomics Web Portal called Stargate - NGP to be published in BUDDI
- Can enable other systems such as myGrid to use
NGP Web Services to build a glycomics workbench
15Conclusions
- As part of NCRR Integrated Technology Resource
for Biomedical Glycomics, we implemented a
Semantic Web Process for high throughput
glycomics in open, web-centric environment - Large domain specific ontologies with process
(ProPreO) and domain (GlycO) knowledge concepts
was used to describe and classify Web Services
at Semantic level - Used proposed Semantic Web Service specification
(WSDL-S) to add semantics to Web Service
description - Biological UDDI (BUDDI) part of Stargate is
being developed as a single-window resource to
discover and publish Web Services in
glycoproteomics domain
16Resources
- NCRR (Integrated Technology Resource for
Biomedical Glycomics) http//cell.ccrc.uga.edu/wo
rld/glycomics/glycomics.php - Bioinformatics core of Glycomics project
http//lsdis.cs.uga.edu/projects/glycomics/ - ProPreO process Ontology http//lsdis.cs.uga.edu/
projects/glycomics/propreo/ - GlycO domain Ontology
- http//lsdis.cs.uga.edu/projects/glycomics/gly
co/ - Stargate GlycoProteomics Web Portal
- http//128.192.9.86/stargate
- WSDL-S joint UGA-IBM technical note
- http//lsdis.cs.uga.edu/library/download/WSDL-
S-V1.pdf
17Acknowledgement
Special Thanks James Atwood (CCRC,
UGA) Meenakshi Nagarajan (LSDIS Lab, UGA) Blake
Hunter (LSDIS Lab, UGA)
18Extra Slides Stargate subsystems a bit of
detail
- BUDDI BioUDDI is envisioned as the yellow
pages for all WS in life sciences - The classification of WS uses biological taxonomy
- Open resource for the worldwide community of life
sciences research - Format Converter Enables conversion of two
available representation formats into a xml-based
representation - IUPAC to LINUCS to GLYDE (a xml-based
representation) - Web Service Generator Enables existing java
application to be exposed as Web Services - Generates required files from a java application
to allow deployment as a Web Service - Enable the newly generated Web Service to be
published on BioUDDI
19Extra Slides Stargate subsystems a bit of
detail
- Group Forum Members of the research group use
it to foster a sense of community - Schedule meetings, discuss issues, collaborate on
papers - Post papers for peer reviews, publications on
relevant topic - Stargate Search is an integrated unit of the
Stargate - Enables search for research publication within
the research group - Enables search on the internet
- Login Allows restrictions on accessibility of
selected parts of Stargate
20Extra Slides The take home message
Internet
Forum
Search
Web Service Generator
BUDDI