Title: Bio-Trac 25 (Proteomics: Principles and Methods)
1Tutorial Bioinformatics Resources
(http//pir.georgetown.edu/pirwww/workshop/bioinfo
_resource.html)
- Bio-Trac 25 (Proteomics Principles and Methods)
- October 3, 2008
- Zhang-Zhi Hu, M.D.
- Research Associate Professor
- Protein Information Resource, Department of
- Biochemistry and Molecular Cellular Biology
- Georgetown University Medical Center
2What is Bioinformatics?
computer mouse bioinformatics
(information) (biology)
- NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2000) -
Research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral or
health data, including those to acquire, store,
organize, archive, analyze, or visualize such
data.
3Molecular Biology Database Collection
1078 key databases of 14 categories
(http//nar.oxfordjournals.org/cgi/content/full/36
/suppl_1/D2)
4Database Collection in Nucleic Acids Res.
5Online Access to Database Collection
http//pir.georgetown.edu/pirwww/workshop/2005_dat
abase_update.html
2008
http//www.oxfordjournals.org/nar/database/cap/
6Overview
Database Contents, Search and Retrieval
- Text search / Information retrieval
- Sequence genomics databases
- Protein family databases
- Databases of protein functions
- Databases of protein structures
- Proteomics databases
Lab session
7Entrez Text Searches
(http//www.ncbi.nlm.nih.gov/Entrez/)
Lab
8PubMed Literature Database
(http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD
SearchDBPubMed)
PMID14640721
Lab
9iProLINK Protein Literature Mining Resource
RLIMS-P
Text mining for protein phosphorylation
BioThesaurus
Gene/protein name thesaurus synonyms, ambiguous
names
http//pir.georgetown.edu/iprolink/
Lab
10BioThesaurus Gene/protein name searches -
synonyms, ambiguous names
Synonyms CRYAA crystallin, alpha A CRYA1 HSPB4
http//pir.georgetown.edu/iprolink/biothesaurus
Lab
11RLIMS-P Text mining for protein phosphorylation
http//pir.georgetown.edu/iprolink/rlimsp/
Lab
12PIR Text Search (I)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
Google type search vs.
Boolean searches AND, OR, NOT
Lab
13PIR Text Search (II)
Search alpha crystallin A chain that are in
protein families?
null absent not null present
Search for synonyms
Lab
14PIR Text Search (III)
Search what crystallins are enzymes and what
families they belong to?
Can you find which crystallins have 3D structure
determined?
Argininosuccinate lyase (EC 4.3.2.1)
Lab
15UniProt Text Search
http//www.uniprot.org/
Find proteins related to diabetes and with
3D-structure determined?
Lab
16Search continues
Lab
17I. Sequence Genomics Databases
- NCBI Resources
- GenBank An annotated collection of all publicly
available nucleotide and protein sequences. - RefSeq NCBI non-redundant set of reference
sequences, including genomic DNA, transcript
(RNA), and protein products - Entrez Gene Gene-centered information at NCBI.
- UniGene Unified clusters of ESTs and full-length
mRNA sequences . - OMIM Online Mendelian inheritance in man a
catalog of human genetic and genomic disorders. - UniProt Consortium Database Universal protein
resource, a central repository of protein
sequence and function. - Model Organism Genome Databases MGD, RGD, SGD,
Flybase - GeneCards Integrated database of human genes,
maps, proteins and diseases. - SNP Consortium Database (dbSNP) International
HapMap Project Genes associated with human
diseases
(http//www.oxfordjournals.org/nar/database/cap/)
18UniProt Consortium Databases
Universal Protein Resource
(http//www.uniprot.org)
Since October 2002
Since July 2008
19UniProt Report (I)
Lab
Sections of the record
Entry View Sequence Annotation
http//www.uniprot.org/uniprot/P02493
20UniProt Report (II) sequence and features
Lab
21UniProt Report (III) UniRef90
http//www.uniprot.org/uniref/?querymember3aP024
93identity0.9
22Entrez Gene Gene centric information
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbg
enecmdRetrievedoptGraphicslist_uids12954ubo
r0_RefSeq
23OMIM Online Mendelian inheritance in man
Autosomal recessive congenital progressive
cataract
Juvenile cataract of Down syndrome
(http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?i
d123580)
24II. Protein Family Databases
- Whole Proteins
- PIRSF Nonoverlapping Classification of Full
Length Proteins Based on Evolutionary
Relationship - COG (Clusters of Orthologous Groups) of Complete
Genomes - PANTHER Proteins Classified into
Families/Subfamilies of Shared Function - ProtoNet Automatic Hierarchical Classification
of Proteins - Protein Domains
- Pfam Alignments and HMM Models of Protein
Domains - SMART Protein Domain Identification and
Annotation - CDD Conserved Domain Database
- Protein Motifs
- PROSITE Protein Patterns and Profiles
- BLOCKS Protein Sequence Motifs and Alignments
- PRINTS Compendium of Protein Fingerprints (a
group of conserved motifs) - Integrated Family Databases
- InterPro Integrate Pfam, PRINTS, PROSITES,
ProDom, SMART, PIRSF, SuperFamily
25Protein Clustering
Initial version
COGs (http//www.ncbi.nlm.nih.gov/COG/)
New version Includes Eukaryotic Clusters - KOGs
26PIRSF Full Length Classification iProClass
Family Report
Lab
(http//pir.georgetown.edu/cgi-bin/ipcSF?idSF0022
80)
27Domain Classification Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget
.pl?nameCRYAA_RABIT)
(http//pir.georgetown.edu/cgi-bin/ipcEntry?idP02
493)
28Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF005
25)
29Protein Motifs PROSITE A database of protein
families and domains. It consists of biologically
significant sites, patterns and profiles.
(http//us.expasy.org/prosite/)
30Integrated Family Classification
- InterPro
- An integrated resource unifying PROSITE, PRINTS,
ProDom, Pfam, SMART, and TIGRFAMs, PIRSF.
(http//www.ebi.ac.uk/interpro/search.html)
Mapping of families
31III. Databases of Protein Functions
- Metabolic Pathways, Enzymes, and Compounds
- Enzyme Classification Classification and
Nomenclature of Enzyme-Catalysed Reactions
(EC-IUBMB) - KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic Pathways - LIGAND (at KEGG) Chemical Compounds, Reactions
and Enzymes - EcoCyc Encyclopedia of E. coli Genes and
Metabolism - MetaCyc Metabolic Encyclopedia (Metabolic
Pathways) - BRENDA Enzyme Database
- UM-BBD Microbial Biocatalytic Reactions and
Biodegradation Pathways - Inter-Molecular Interactions and Regulatory
Pathways - IntAct Protein interaction data from literature
and user submission - BIND Descriptions of interactions, molecular
complexes and pathways - DIP Catalogs experimentally determined
interactions between proteins - Reactome - A curated knowledgebase of biological
pathways - BioCarta Biological pathways of human and mouse
- GO Gene Ontology Consortium Database
- Pathway Resources - Pathguide
32Biological Pathway Resource Collection
http//www.pathguide.org/
- Protein-protein interactions
- Metabolic pathways
- Signaling pathways
- Pathway diagrams
- Transcription factors / gene regulatory networks
- Protein-compound interactions
- Genetic interaction networks
33Pathway Commons
Search across multiple pathway databases common
format for global analysis
http//www.pathwaycommons.org/pc/home.do
34KEGG Metabolic Regulatory Pathways
Lab
- KEGG is a suite of databases and associated
software, integrating our current knowledge - on molecular interaction networks, the
information of genes and proteins, and of
chemical - compounds and reactions. (http//www.genome.ad.
jp/kegg/kegg2.html)
(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a002204.3.2.1)
35BioCyc EcoCyc/MetaCyc Metabolic Pathways
- The BioCyc Knowledge Library is a collection of
Pathway/Genome Databases (http//biocyc.org/)
36BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
37Reactome http//www.reactome.org/
- Collaboration of CSHL, EBI and GO Consortium
- Curated resource of core pathways and reactions
in human biology - Authored by biological researchers of field
experts - Cross-referenced with NCBI, Ensembl and UniProt,
HapMap, KEGG - Inferred orthologous events in 22 non-human
species (mouse, rat)
38Transforming Growth Factor (TGF) beta signaling
Homo sapiens
(http//reactome.org/cgi-bin/eventbrowser?DBgk_cu
rrentFOCUS_SPECIESHomo20sapiensID170834)
Reactome events and objects (including modified
forms and complex)
Event -gtREACT_6879.1 Activated type I receptor
phosphorylates R-SMAD directly Homo sapiens
Object -gt REACT_7364.1 Phospho-R-SMAD
cytosol Event -gt REACT_6760.1 Phospho-R-SMAD
forms a complex with CO-SMAD Homo
sapiens Object -gt REACT_7344.1
Phospho-R-SMADCO-SMAD complex cytosol Event -gt
REACT_6726.1 The phospho-R-SMADCO-SMAD
transfers to the nucleus Object -gt REACT_7382.2
Phospho-R-SMADCO-SMAD complex nucleoplasm
39Protein-Protein Interaction Database - IntAct
(http//www.ebi.ac.uk/intact/)
40Gene Ontology (GO)
(http//www.geneontology.org/)
- Molecular Function - Biological Process -
Cellular Component
41IV. Databases of Protein Structures
- Protein Structure
- PDB Structure Determined by X-ray
Crystallography and NMR - PDBsum Summaries and analyses of PDB structures
- MMDB NCBIs database of 3D structures, part of
NCBI Entrez - SWISS-MODEL Repository Database of annotated
protein 3D models - ModBase Annotated comparative protein structure
models - Structure Classification
- CATH Hierarchical Classification of Protein
Domain Structures - SCOP Familial and Structural Protein
Relationships - FSSP Protein Fold Classification Based on
Structure--Structure Alignment
42PDB Experimental 3D Structure Repository
Rat gamma-crystallin (chain A, B.)
Can you do a text search at PIR to find this
(CRGE_RAT)?
(http//www.rcsb.org/pdb/)
Lab
43PDBsum
Pictorial Database to Provide Summary and
Analysis to PDB Entries
Search
3-D structure summary
2-D structure summary
(http//www.ebi.ac.uk/thornton-srv/databases/pdbsu
m/)
44Protein Structural Classification (1)
CATH Hierarchical domain classification of
protein structures (http//www.cathdb.info/)
45Protein Structural Classification (2)
SCOP comprehensive description of structural
and evolutionary relationships between all
proteins whose structure is known.
(http//scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.ht
ml)
46SWISS-MODEL Repository
http//swissmodel.expasy.org/
http//swissmodel.expasy.org/repository/
A database of annotated three-dimensional
comparative protein structure models
(http//swissmodel.expasy.org/repository/smr.php?s
ptr_acCRBA1_MOUSEjob2)
47VI. Proteomic Resources
- GELBANK (http//gelbank.anl.gov) 2D-gel patterns
of species with completed genomes. - SWISS-2DPAGE (http//www.expasy.org/ch2d/) index
of 2D-gels - PEP (http//cubic.bioc.columbia.edu/ pep/)
Predictions for Entire Proteomes summarized
analyses of protein sequences - Integr8 (http//www.ebi.ac.uk/integr8/) A
browser for information relating to completed
genomes and proteomes, based on data contained in
Genome Reviews and the UniProt proteome sets - PRIDE (http//www.ebi.ac.uk/pride/) PRoteomics
IDEntifications database Expression Profiling
databases - GPMdb (http//gpmdb.thegpm.org/) Mass spec
proteomics Databases - PeptideAtlas (http//www.peptideatlas.org/)
compendium of peptides identified in a large set
of tandem mass spectrometry proteomic experiments - HUPO (http//www.hupo.org/) Human Proteome
Organization to foste international proteomics
initiatives.
482D-Gel Image Databases
Lab
(http//us.expasy.org/ch2d/)
Part of WORLD-2DPAGE index to 2-D PAGE databases
and services
(http//us.expasy.org/swiss-2dpage/acP02489)
49GPMdb MS Data Search
(http//gpmdb.thegpm.org/)
Craig, et al., J Proteome Res. 2004, 31234-42.
50PRIDE centralized, standards compliant, public
data repository for proteomics data
http//www.ebi.ac.uk/pride/
51Lab
- Text search / Information retrieval
- Literature search and text mining
- Finding synonyms (BioThesaurus)
- Information extraction (e.g., protein
phosphorylation sites) - Find the sequence for the rabbit alpha crystallin
A chain - Find all alpha crystallin A chain classified in
protein families - Search crystallins that have active enzyme
activities - Find crystallins that have determined 3D
structures - Database contents (reports)
- Sequence genomics databases (UniProt)
- Protein family databases (PIRSF)
- Database of protein functions (KEGG)
- Databases of protein structures (PDB)
- Proteomics databases (Swiss-2D)
- Protein Examples
- Rabbit alpha crystallin A (UniProtKB
CRYAA_RABIT/P02493) - Delta crystallin II (Argininosuccinate lyase)
(UniProtKB ARLY2_ANAPL/P24058) - Any additional proteins of your interest for
search and retrieval