Title: Bio-Trac 25 (Proteomics: Principles and Methods)
1Tutorial Bioinformatics Resources(http//pir.ge
orgetown.edu/huz/class/bioinfo_resource.html)
- Bio-Trac 25 (Proteomics Principles and Methods)
- March 26, 2004
- Zhang-Zhi Hu, M.D.
- Senior Bioinformatics Scientist
- Protein Information Resource
- National Biomedical Research Foundation, GUMC
2What is Bioinformatics?
computer mouse bioinformatics
(information) (biology)
- NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2000) -
Research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral or
health data, including those to acquire, store,
organize, archive, analyze, or visualize such
data.
3Molecular Biology Database Collection
(http//nar.oupjournals.org/cgi/content/full/32/su
ppl_1/D3)
-- 548 key databases of 11 categories
4(http//pir.georgetown.edu/huz/class/2004_databas
e_update.html)
5Overview
Database Contents, Search and Retrieval
- Text search / Information retrieval
- Sequence genomics databases
- Protein family databases
- Database of protein functions
- Databases of protein structures
- 2D-gel databases
- Proteomics databases
6Entrez Text Searches
(http//www.ncbi.nlm.nih.gov/Entrez/)
7PubMed Literature Database(http//www.ncbi.nlm.ni
h.gov/entrez/query.fcgi?CMDSearchDBPubMed)
8UniProt Text Search
(http//www.pir.uniprot.org/cgi-bin/textSearch)
9PIR Text Search (I)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
Whats different between CRAA_RABIT CYRBAA?
How about Search Crystallin and SuperFamily?
10PIR Text Search (II)
Can you find which crystallin that has 3D
structure determined using PIR text search?
11I. Sequence Genomics Databases
- GenBank An annotated collection of all publicly
available nucleotide and protein sequences. - RefSeq NCBI non-redundant set of reference
sequences, including genomic DNA, transcript
(RNA), and protein products - UniProt Consortium Database Universal protein
knowledgebase, a central resource of protein
sequence and function from Swiss-Prot, TrEMBL and
PIR. - LocusLink Curated sequences and descriptions of
genetic loci. - UniGene Unified clusters of ESTs and full-length
mRNA sequences . - OMIM Online Mendelian inheritance in man a
catalog of human genetic and genomic disorders. - Model Organism Genome Databases MGD, RGD, SGD,
Flybase - GeneCards Integrated database of human genes,
maps, proteins and diseases. - SNP Consortium Database
12UniProt Consortium Database
UniProt (knowledgebase) UniRef (100,90,50) UniParc
(archive)
(http//www.uniprot.org)
13UniProt Sequence Report (I)
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idC
RAA_RABIT)
14UniProt Sequence Report (II)
(http//www.pir.uniprot.org/cgi-bin/unipEntry?idU
niRef90_P02489)
15NCBI LocusLink
(http//www.ncbi.nlm.nih.gov/LocusLink)
16OMIM Online Mendelian inheritance in man
(http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?i
d123580)
17II. Protein Family Databases
- Whole Proteins
- PIRSF A Network Classification System of Protein
Families - COG (Clusters of Orthologous Groups) of Complete
Genomes - ProtoNet Automated Hierarchical Classification
of Proteins - Protein Domains
- Pfam Alignments and HMM Models of Protein
Domains - SMART Protein Domain Families
- CDD Conserved Domain Database
- Protein Motifs
- PROSITE Protein Patterns and Profiles
- BLOCKS Protein Sequence Motifs and Alignments
- PRINTS Protein Sequence Motifs and Signatures
- Integrated Family Databases
- iProClass Superfamilies/Families, Domains,
Motifs, Rich Links - InterPro Integrate Pfam, PRINTS, PROSITES,
ProDom, SMART, PIRSF, SuperFamily
18Domain Classification
(http//www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget
.pl?nameCRAA_RABIT)
(http//pir.georgetown.edu/cgi-bin/ipcEntry?idCRA
A_RABIT)
19Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF005
25)
20Integrated Family Classification
- InterPro
- An integrated resource unifying PROSITE, PRINTS,
ProDom, Pfam, SMART, and TIGRFAMs, PIRSF.
(http//www.ebi.ac.uk/interpro/search.html)
21PIRSF Full Length Classification iProClass
Family Report
(http//pir.georgetown.edu/cgi-bin/ipcSF?idSF0022
80)
22III. Databases of Protein Functions
- Metabolic Pathways, Enzymes, and Compounds
- Enzyme Classification Classification and
Nomenclature of Enzyme-Catalysed Reactions
(EC-IUBMB) - KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic Pathways - LIGAND (at KEGG) Chemical Compounds, Reactions
and Enzymes - EcoCyc Encyclopedia of E. coli Genes and
Metabolism - MetaCyc Metabolic Encyclopedia (Metabolic
Pathways) - WIT Functional Curation and Metabolic Models
- BRENDA Enzyme Database
- UM-BBD Microbial Biocatalytic Reactions and
Biodegradation Pathways - Cellular Regulation and Gene Networks
- EpoDB Genes Expressed during Human
Erythropoiesis - BIND Descriptions of interactions, molecular
complexes and pathways - DIP Catalogs experimentally determined
interactions between proteins - BioCarta Biological pathways of human and mouse
- GO Gene Ontology Consortium Database
23KEGG Metabolic Regulatory Pathways
- KEGG is a suite of databases and associated
software, integrating our current knowledge - on molecular interaction networks, the
information of genes and proteins, and of
chemical - compounds and reactions. (http//www.genome.ad.
jp/kegg/kegg2.html)
(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a002204.3.2.1)
24BioCyc (EcoCyc/MetaCyc Metabolic Pathways)
- The BioCyc Knowledge Library is a collection of
Pathway/Genome Databases (http//biocyc.org/)
25BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
26Protein-Protein Interaction BIND
(http//www.bind.ca/)
27Gene Ontology(http//www.geneontology.org/)
Three GOs Molecular Function Biological
Process Cellular Component
28IV. Databases of Protein Structures
- Protein Structure
- PDB Structure Determined by X-ray
Crystallography and NMR - PDBsum Summaries and analyses of PDB structures
- MMDB NCBIs database of 3D structures, part of
NCBI Entrez - SWISS-MODEL Repository Database of annotated
protein 3D models - ModBase Annotated comparative protein structure
models - Structure Classification
- CATH Hierarchical Classification of Protein
Domain Structures - SCOP Familial and Structural Protein
Relationships - FSSP Protein Fold Classification Based on
Structure--Structure Alignment
29PDB 3D Structure
Rat gamma-crystallin, chain A, B. Can you do a
text search at PIR to find this?
(http//www.rcsb.org/pdb/)
30PDBsum
Summary and Analysis (http//www.biochem.ucl.ac.uk
/bsm/pdbsum)
31Protein Structural Classification (1)
CATH Hierarchical domain classification of
protein structures (http//www.biochem.
ucl.ac.uk/bsm/cath_new/)
32Protein Structural Classification (2)
SCOP comprehensive description of structural
and evolutionary relationships between all
proteins whose structure is known.
(http//scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.ht
ml)
33SWISS-MODEL Repository
A database of annotated three-dimensional
comparative protein structure models
(http//swissmodel.expasy.org/repository/smr.php?s
ptr_acCRGE_RATjob2)
34VI. Proteomic Resources
- GELBANK (http//gelbank.anl.gov) 2D-gel patterns
from completed genomes SWISS-2DPAGE
(http//www.expasy.org/ch2d/) - PEP Predictions for Entire Proteomes
(http//cubic.bioc.columbia.edu/ pep/)
Summarized analyses of protein sequences - Proteome BioKnowledge Library (http//www.proteom
e.com) Detailed information on human, mouse and
rat proteomes - Proteome Analysis Database (http//www.ebi.ac.uk/p
roteome/) Online application of InterPro and
CluSTr for the functional classification of
proteins in whole genomes - Expression Profiling databases GNF
(http//expression.gnf.org/cgi-bin/index.cgi,
human and mouse transcriptome), SMD
(http//genome-www5.stanford.edu/MicroArray/SMD/,
Stanford microarray data analysis), EBI
Microarray Informatics (http//www.ebi.ac.uk/micro
array/ index.html , managing, storing and
analyzing microarray data)
352D-Gel Image Databases (1)
(http//us.expasy.org/ch2d/2d-index.html)
(http//us.expasy.org/cgi-bin/nice2dpage.pl?P02489
)
362D-Gel Image Databases (2)
(http//gelbank.anl.gov/2dgels/index.asp)
37Expression Profiling
- Human and Mouse Transcriptome
(http//genome-www.stanford.edu/serum/)
(http//expression.gnf.org/cgi-bin/index.cgi)
(http//expression.gnf.org/cgi-bin/index.cgi/)
38Lab