Title: BIO-TRAC 25 (Proteomics: Principles and Methods)
1Tutorial Bioinformatics Resources
- BIO-TRAC 25 (Proteomics Principles and Methods)
- October 10, 2003
- NIH, Bethesda, MD
- Zhang-Zhi Hu, M.D.
- Senior Bioinformatics Scientist,
- Protein Information Resource
- National Biomedical Research Foundation, GUMC
2What is Bioinformatics?
- Bioinformatics is the application of information
technology to the analysis, organization and
distribution of biological data in order to
answer complex biological questions.
- NIH Biomedical Information Science and Technology
Initiative (BISTI) Working Definition (2002) -
Research, development, or application of
computational tools and approaches for expanding
the use of biological, medical, behavioral or
health data, including those to acquire, store,
organize, archive, analyze, or visualize such
data.
3Bioinformatics Resources
- The Molecular Biology Database Collection An
Online Compilation of Relevant Database Resources - 2003 update http//www3.oup.co.uk/nar/database/
- Nucleic Acids Research Database Issues (January
Annually) (2003 - http//nar.oupjournals.org/conte
nt/vol31/issue1/) - DBcat A Catalog of gt 500 Biological Databases
- http//www.infobiogen.fr/services/dbcat/
4Molecular Biology Database Collection
(http//nar.oupjournals.org/cgi/content/full/31/1/
1GKG120TB1)
5The Molecular Biology Database Collection 2003
update (Baxevanis, A.D.)-- An online resource of
386 key databases of 18 categories
- Major sequence repositories
- Comparative Genomics
- Gene Expression
- Gene Identification and Structure
- Genetic and Physical Maps
- Genomic Databases
- Intermolecular Interactions
- Metabolic Pathways and Cellular Regulation
- Mutation Databases
- Pathology
- Protein Sequence Motifs
- Proteome Resources
- Retrieval Systems and Database Structure
- RNA Sequences
- Structure
- Transgenics
- Varied Biomedical Content
6Overview
- Protein Sequence Analysis
- I. Sequence Similarity Search and Alignment
- II. Family Classification Methods
- III. Structure Prediction Methods
- Molecular Biology Databases
- IV. Protein Family Databases
- V. Database of Protein Functions
- VI. Databases of Protein Structures
- Proteomic Resources
- VII. 2D-gel databases
- VIII. Proteomic analyses
7I. Sequence Similarity Search
- Find a protein sequence text search
- Based on Pair-Wise Comparisons
- BLOSUM scoring matrix
- PAM scoring matrix
- Dynamic Programming Algorithms
- Global Similarity Needleman-Wunsch (GAP/BestFit)
- Local Similarity Smith-Waterman (SSEARCH)
- Heuristic Algorithms (Sequence Database
Searching) - FASTA Based on K-Tuples (2-Amino Acid)
- BLAST Triples of Conserved Amino Acids
- Gapped-BLAST Allow Gaps in Segment Pairs (NREF)
- PHI-BLAST Pattern-Hit Initiated Search (NCBI)
- PSI-BLAST Iterative Search (NCBI)
8Sequence Search by Text or Unique ID
Entrez (http//www.ncbi.nlm.nih.gov/Entrez/)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
9Pair-Wise Comparisons
- Scoring matrix
- Global and local
- Similarity Dynamic Programming
- (Needleman-Wunsch,
- Smith-Waterman)
(http//www.ebi.ac.uk/emboss/align/)
10FASTA Search
(http//pir.georgetown.edu/pirwww/search/fasta.htm
l)
(http//www.ebi.ac.uk/fasta33/)
11Gapped-BLAST Search
(http//pir.georgetown.edu/pirwww/search/pirnref.s
html)
(http//www.ncbi.nlm.nih.gov/BLAST/)
12A BLAST Result
13PSI-BLAST Iterative Search
(http//www.ncbi.nlm.nih.gov/BLAST/)
14PSI-BLAST
15II. Family Classification Methods
- Multiple Sequence Alignment and Phylogenetic
Analysis - ClustalW Multiple Sequence Alignment
- Alignment Editor Phylogenetic Trees
- Searches Based on Family Information
- PROSITE Pattern Search
- Motif and Profile Search
- Hidden Markov Model (HMMs)
16Multiple Sequence Alignment
- ClustalW (http//pir.georgetown.edu/pirwww/search
/multaln.html)
17Alignment Editor (Jalview)
(http//www.ebi.ac.uk/clustalw/)
18Alignment Editor (GeneDoc)
(http//www.psc.edu/biomed/genedoc/)
19Phylogenetic Analysis
Tree Programs (http//evolution.
genetics.washington.edu/phylip.html)
Tree Searches (http//pauling.
mbu.iisc.ernet.in/pali/index.html)
20Phylogenetic Trees (IGFBP Superfamily)
(Radial Tree)
(Phylogram)
21PROSITE Pattern Search
(http//pir.georgetown.edu/pirwww/search/patmatch.
html)
22Profile Search
(http//bmerc-www.bu.edu/bioinformatics/profile_re
quest.html)
23Hidden Markov Model Search
(http//www.sanger.ac.uk/Software/Pfam/search.shtm
l)
(http//smart.embl-heidelberg.de)
24III. Structural Prediction Methods
- Signal Peptide SIGFIND, SignalP
- Transmembrane Helix TMHMM, TMAP
- 2D Prediction (a-helix, b-sheet, Coiled-coils)
PHD, JPred - 3D Modeling Homology Modeling (Modeller,
SWISS-MODEL), Threading, Ab-initio Prediction
25StructurePredictionA Guide
(http//speedy.embl-heidelberg.de/gtsp/flowchart2.
html)
26Protein Prediction Server
(http//www.cbs.dtu.dk/services/)
27Signal Peptide Prediction
(http//www.stepc.gr/synaptic/sigfind.html)
(http//www.cbs.dtu.dk/services/SignalP-2.0)
28Transmembrane Helix
(http//www.cbs.dtu.dk/services/TMHMM/)
29Protein Structure Prediction
(http//cmgm.stanford.edu/WWW/www_predict.html)
(http//restools.sdsc.edu/biotools/biotools9.html)
30Structure Prediction Server
(http//cubic.bioc.columbia.edu/predictprotein/)
(http//www.compbio.dundee.ac.uk/WWW_Servers/JPred
/jpred.html)
313D-Modelling
(http//www.salilab.org/modeller/modeller.html)
(http//www.expasy.ch/swissmod/SWISS-MODEL.html)
32IV. Protein Family Databases
- Whole Proteins
- PIR Superfamilies and Families
- COG (Clusters of Orthologous Groups) of Complete
Genomes - ProtoNet Automated Hierarchical Classification
of Proteins - Protein Domains
- Pfam Alignments and HMM Models of Protein
Domains - SMART Protein Domain Families
- Protein Motifs
- PROSITE Protein Patterns and Profiles
- BLOCKS Protein Sequence Motifs and Alignments
- PRINTS Protein Sequence Motifs and Signatures
- Integrated Family Databases
- iProClass Superfamilies/Families, Domains,
Motifs, Rich Links - InterPro Integrate Pfam, PRINTS, PROSITES,
ProDom, SMART
33Protein Clustering
(http//www.ncbi.nlm.nih.gov/COG/)
34Protein Domains
- Pfam (http//www.sanger.ac.uk/Software/Pfam/)
- SMART (http// smart.embl-heid elberg.de/smart/
show_motifs.pl)
35Protein Motifs
- PROSITE is a database of protein families and
domains. It consists of biologically significant
sites, patterns and profiles. (http//www.expasy.c
h/prosite/)
36Integrated Family Classification
- InterPro An integrated resource unifying
PROSITE, PRINTS, ProDom, Pfam, SMART, and
TIGRFAMs, PIRSF. (http//www.ebi.ac.uk/interpro/se
arch.html)
37V. Databases of Protein Functions
- Metabolic Pathways, Enzymes, and Compounds
- Enzyme Classification Classification and
Nomenclature of Enzyme-Catalysed Reactions
(EC-IUBMB) - KEGG (Kyoto Encyclopedia of Genes and Genomes)
Metabolic Pathways - LIGAND (at KEGG) Chemical Compounds, Reactions
and Enzymes - EcoCyc Encyclopedia of E. coli Genes and
Metabolism - MetaCyc Metabolic Encyclopedia (Metabolic
Pathways) - WIT Functional Curation and Metabolic Models
- BRENDA Enzyme Database
- UM-BBD Microbial Biocatalytic Reactions and
Biodegradation Pathways - Klotho Collection and Categorization of
Biological Compounds - Cellular Regulation and Gene Networks
- EpoDB Genes Expressed during Human
Erythropoiesis - BIND Descriptions of interactions, molecular
complexes and pathways - DIP Catalogs experimentally determined
interactions between proteins - RegulonDB Escherichia coli Pathways and
Regulation
38KEGG Metabolic Regulatory Pathways
- KEGG is a suite of databases and associated
software, integrating our current knowledge - on molecular interaction networks, the
information of genes and proteins, and of
chemical - compounds and reactions. (http//www.genome.ad.
jp/kegg/kegg2.html)
(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a00590874)
39BioCyc (EcoCyc/MetaCyc Metabolic Pathways)
- The BioCyc Knowledge Library is a collection of
Pathway/Genome - Databases (http//biocyc.org/)
40Protein-Protein Interactions DIP
(http//dip.doe-mbi.ucla.edu/)
41Protein-Protein Interaction BIND
(http//www.bind.ca/)
42BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
43VI. Databases of Protein Structures
- Protein Structure and Classification
- PDB Structure Determined by X-ray
Crystallography and NMR - CATH Hierarchical Classification of Protein
Domain Structures - SCOP Familial and Structural Protein
Relationships - FSSP Protein Fold Family Database
- Protein Sequence-Structure Relationship
- PIR-NRL3D Protein Sequence-Structure Database
- PIR-RESID Protein Structure/Post-Translational
Modifications - HSSP Families and Alignments of
Structurally-Conserved Regions
44PDB Structure Data
(http//www.rcsb.org/pdb/)
45PDBsum
Summary and Analysis (http//www.biochem.ucl.ac.uk
/bsm/pdbsum)
46Protein Structural Classification
CATH Hierarchical domain classification of
protein structures (http//www.biochem. ucl.ac.uk/
bsm/cath_new/)
47Protein Structural Classification
The SCOP database aims to provide a detailed and
comprehensive description of the structural and
evolutionary relationships between all proteins
whose structure is known, including all entries
in the PDB.
(http//scop.mrc-lmb. cam.ac.uk/scop/)
48VII. Proteomic Resources
- GELBANK (http//gelbank.anl.gov) 2D-gel patterns
from completed genomes SWISS-2DPAGE
(http//www.expasy.org/ch2d/) - PEP Predictions for Entire Proteomes
(http//cubic.bioc.columbia.edu/ pep/)
Summarized analyses of protein sequences - Proteome BioKnowledge Library (http//www.proteom
e.com) Detailed information on human, mouse and
rat proteomes - Proteome Analysis Database (http//www.ebi.ac.uk/p
roteome/) Online application of InterPro and
CluSTr for the functional classification of
proteins in whole genomes - Expression Profiling databases GNF
(http//expression.gnf.org/cgi-bin/index.cgi,
human and mouse transcriptome), SMD
(http//genome-www5.stanford.edu/MicroArray/SMD/,
Stanford microarray data analysis), EBI
Microarray Informatics (http//www.ebi.ac.uk/micro
array/ index.html , managing, storing and
analyzing microarray data)
492D-Gel Image Databases (1)
(http//gelbank.anl.gov/2dgels/index.asp)
502D-Gel Image Databases (2)
(http//us.expasy.org/ch2d/2d-index.html)
(http//us.expasy.org/cgi-bin/nice2dpage.pl?P06493
)
51VIII. Proteome Analysis
(http//www.ebi.ac.uk/proteome)
52Expression Profiling
- Human and Mouse Transcriptome
(http//expression.gnf.org/cgi-bin/index.cgi)
(http//genome-www. stanford.edu/serum/)
53Lab
- Visit selected websites and analyze some protein
sequences of - your own choices.
- - List of Bioinformatics Resources of this
tutorial available - http//pir.georgetown.edu/huz/bioinfo_resourc
e.html - Try some of the following sequences for
analysis - 1) well characterized proteins
PIRA26366(CYP17), JS0747(Sp1) - 2) less characterized proteins
PIRA59000(MATER) - TrEMBLQ9QY16(GRTH)
- 3) hypothetical protein PIRT12515, T00338
, T47130 - SWISS-PROTQ9BWT7