Bio-Trac 25 (Proteomics: Principles and Methods) - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Bio-Trac 25 (Proteomics: Principles and Methods)

Description:

Proteomic Resources 2D-Gel Image Databases Slide 49 PRIDE: centralized, standards compliant, public data repository for proteomics data Slide 51 ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 52
Provided by: wuc
Category:

less

Transcript and Presenter's Notes

Title: Bio-Trac 25 (Proteomics: Principles and Methods)


1
Tutorial Bioinformatics Resources

(http//pir.georgetown.edu/pirwww/workshop/bioinfo
_resource.html)
  • Bio-Trac 25 (Proteomics Principles and Methods)
  • October 3, 2008
  • Zhang-Zhi Hu, M.D.
  • Research Associate Professor
  • Protein Information Resource, Department of
  • Biochemistry and Molecular Cellular Biology
  • Georgetown University Medical Center

2
What is Bioinformatics?
computer mouse bioinformatics
(information) (biology)
  • NIH Biomedical Information Science and Technology
    Initiative (BISTI) Working Definition (2000) -
    Research, development, or application of
    computational tools and approaches for expanding
    the use of biological, medical, behavioral or
    health data, including those to acquire, store,
    organize, archive, analyze, or visualize such
    data.

3
Molecular Biology Database Collection
1078 key databases of 14 categories
(http//nar.oxfordjournals.org/cgi/content/full/36
/suppl_1/D2)
4
Database Collection in Nucleic Acids Res.
5
Online Access to Database Collection
http//pir.georgetown.edu/pirwww/workshop/2005_dat
abase_update.html
2008
http//www.oxfordjournals.org/nar/database/cap/
6
Overview
Database Contents, Search and Retrieval
  1. Text search / Information retrieval
  2. Sequence genomics databases
  3. Protein family databases
  4. Databases of protein functions
  5. Databases of protein structures
  6. Proteomics databases

Lab session
7
Entrez Text Searches
(http//www.ncbi.nlm.nih.gov/Entrez/)
Lab
8
PubMed Literature Database
(http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD
SearchDBPubMed)
PMID14640721
Lab
9
iProLINK Protein Literature Mining Resource
RLIMS-P
Text mining for protein phosphorylation
BioThesaurus
Gene/protein name thesaurus synonyms, ambiguous
names
http//pir.georgetown.edu/iprolink/
Lab
10
BioThesaurus Gene/protein name searches -
synonyms, ambiguous names
Synonyms CRYAA crystallin, alpha A CRYA1 HSPB4
http//pir.georgetown.edu/iprolink/biothesaurus
Lab
11
RLIMS-P Text mining for protein phosphorylation
http//pir.georgetown.edu/iprolink/rlimsp/
Lab
12
PIR Text Search (I)
(http//pir.georgetown.edu/pirwww/search/textsearc
h.html)
Google type search vs.
Boolean searches AND, OR, NOT
Lab
13
PIR Text Search (II)
Search alpha crystallin A chain that are in
protein families?
null absent not null present
Search for synonyms
Lab
14
PIR Text Search (III)
Search what crystallins are enzymes and what
families they belong to?
Can you find which crystallins have 3D structure
determined?
Argininosuccinate lyase (EC 4.3.2.1)
Lab
15
UniProt Text Search
http//www.uniprot.org/
Find proteins related to diabetes and with
3D-structure determined?
Lab
16
Search continues
Lab
17
I. Sequence Genomics Databases
  • NCBI Resources
  • GenBank An annotated collection of all publicly
    available nucleotide and protein sequences.
  • RefSeq NCBI non-redundant set of reference
    sequences, including genomic DNA, transcript
    (RNA), and protein products
  • Entrez Gene Gene-centered information at NCBI.
  • UniGene Unified clusters of ESTs and full-length
    mRNA sequences .
  • OMIM Online Mendelian inheritance in man a
    catalog of human genetic and genomic disorders.
  • UniProt Consortium Database Universal protein
    resource, a central repository of protein
    sequence and function.
  • Model Organism Genome Databases MGD, RGD, SGD,
    Flybase
  • GeneCards Integrated database of human genes,
    maps, proteins and diseases.
  • SNP Consortium Database (dbSNP) International
    HapMap Project Genes associated with human
    diseases

(http//www.oxfordjournals.org/nar/database/cap/)
18
UniProt Consortium Databases
Universal Protein Resource
(http//www.uniprot.org)
Since October 2002
Since July 2008
19
UniProt Report (I)
Lab
Sections of the record
Entry View Sequence Annotation
http//www.uniprot.org/uniprot/P02493
20
UniProt Report (II) sequence and features
Lab
21
UniProt Report (III) UniRef90
http//www.uniprot.org/uniref/?querymember3aP024
93identity0.9
22
Entrez Gene Gene centric information
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbg
enecmdRetrievedoptGraphicslist_uids12954ubo
r0_RefSeq
23
OMIM Online Mendelian inheritance in man
Autosomal recessive congenital progressive
cataract
Juvenile cataract of Down syndrome
(http//www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?i
d123580)
24
II. Protein Family Databases
  • Whole Proteins
  • PIRSF Nonoverlapping Classification of Full
    Length Proteins Based on Evolutionary
    Relationship
  • COG (Clusters of Orthologous Groups) of Complete
    Genomes
  • PANTHER Proteins Classified into
    Families/Subfamilies of Shared Function
  • ProtoNet Automatic Hierarchical Classification
    of Proteins
  • Protein Domains
  • Pfam Alignments and HMM Models of Protein
    Domains
  • SMART Protein Domain Identification and
    Annotation
  • CDD Conserved Domain Database
  • Protein Motifs
  • PROSITE Protein Patterns and Profiles
  • BLOCKS Protein Sequence Motifs and Alignments
  • PRINTS Compendium of Protein Fingerprints (a
    group of conserved motifs)
  • Integrated Family Databases
  • InterPro Integrate Pfam, PRINTS, PROSITES,
    ProDom, SMART, PIRSF, SuperFamily

25
Protein Clustering
Initial version
COGs (http//www.ncbi.nlm.nih.gov/COG/)
New version Includes Eukaryotic Clusters - KOGs
26
PIRSF Full Length Classification iProClass
Family Report
Lab
(http//pir.georgetown.edu/cgi-bin/ipcSF?idSF0022
80)
27
Domain Classification Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget
.pl?nameCRYAA_RABIT)
(http//pir.georgetown.edu/cgi-bin/ipcEntry?idP02
493)
28
Pfam Domain
(http//www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF005
25)
29
Protein Motifs PROSITE A database of protein
families and domains. It consists of biologically
significant sites, patterns and profiles.
(http//us.expasy.org/prosite/)
30
Integrated Family Classification
  • InterPro
  • An integrated resource unifying PROSITE, PRINTS,
    ProDom, Pfam, SMART, and TIGRFAMs, PIRSF.
    (http//www.ebi.ac.uk/interpro/search.html)

Mapping of families
31
III. Databases of Protein Functions
  • Metabolic Pathways, Enzymes, and Compounds
  • Enzyme Classification Classification and
    Nomenclature of Enzyme-Catalysed Reactions
    (EC-IUBMB)
  • KEGG (Kyoto Encyclopedia of Genes and Genomes)
    Metabolic Pathways
  • LIGAND (at KEGG) Chemical Compounds, Reactions
    and Enzymes
  • EcoCyc Encyclopedia of E. coli Genes and
    Metabolism
  • MetaCyc Metabolic Encyclopedia (Metabolic
    Pathways)
  • BRENDA Enzyme Database
  • UM-BBD Microbial Biocatalytic Reactions and
    Biodegradation Pathways
  • Inter-Molecular Interactions and Regulatory
    Pathways
  • IntAct Protein interaction data from literature
    and user submission
  • BIND Descriptions of interactions, molecular
    complexes and pathways
  • DIP Catalogs experimentally determined
    interactions between proteins
  • Reactome - A curated knowledgebase of biological
    pathways
  • BioCarta Biological pathways of human and mouse
  • GO Gene Ontology Consortium Database
  • Pathway Resources - Pathguide

32
Biological Pathway Resource Collection
http//www.pathguide.org/
  • Protein-protein interactions
  • Metabolic pathways
  • Signaling pathways
  • Pathway diagrams
  • Transcription factors / gene regulatory networks
  • Protein-compound interactions
  • Genetic interaction networks

33
Pathway Commons
Search across multiple pathway databases common
format for global analysis
http//www.pathwaycommons.org/pc/home.do
34
KEGG Metabolic Regulatory Pathways
Lab
  • KEGG is a suite of databases and associated
    software, integrating our current knowledge
  • on molecular interaction networks, the
    information of genes and proteins, and of
    chemical
  • compounds and reactions. (http//www.genome.ad.
    jp/kegg/kegg2.html)

(http//www.genome.ad.jp/dbget-bin/show_pathway?hs
a002204.3.2.1)
35
BioCyc EcoCyc/MetaCyc Metabolic Pathways
  • The BioCyc Knowledge Library is a collection of
    Pathway/Genome Databases (http//biocyc.org/)

36
BioCarta Cellular Pathways
(http//www.biocarta.com/index.asp)
37
Reactome http//www.reactome.org/
  • Collaboration of CSHL, EBI and GO Consortium
  • Curated resource of core pathways and reactions
    in human biology
  • Authored by biological researchers of field
    experts
  • Cross-referenced with NCBI, Ensembl and UniProt,
    HapMap, KEGG
  • Inferred orthologous events in 22 non-human
    species (mouse, rat)

38
Transforming Growth Factor (TGF) beta signaling
Homo sapiens
(http//reactome.org/cgi-bin/eventbrowser?DBgk_cu
rrentFOCUS_SPECIESHomo20sapiensID170834)
Reactome events and objects (including modified
forms and complex)
Event -gtREACT_6879.1 Activated type I receptor
phosphorylates R-SMAD directly Homo sapiens
Object -gt REACT_7364.1 Phospho-R-SMAD
cytosol Event -gt REACT_6760.1 Phospho-R-SMAD
forms a complex with CO-SMAD Homo
sapiens Object -gt REACT_7344.1
Phospho-R-SMADCO-SMAD complex cytosol Event -gt
REACT_6726.1 The phospho-R-SMADCO-SMAD
transfers to the nucleus Object -gt REACT_7382.2
Phospho-R-SMADCO-SMAD complex nucleoplasm
39
Protein-Protein Interaction Database - IntAct
(http//www.ebi.ac.uk/intact/)
40
Gene Ontology (GO)
(http//www.geneontology.org/)
- Molecular Function - Biological Process -
Cellular Component
41
IV. Databases of Protein Structures
  • Protein Structure
  • PDB Structure Determined by X-ray
    Crystallography and NMR
  • PDBsum Summaries and analyses of PDB structures
  • MMDB NCBIs database of 3D structures, part of
    NCBI Entrez
  • SWISS-MODEL Repository Database of annotated
    protein 3D models
  • ModBase Annotated comparative protein structure
    models
  • Structure Classification
  • CATH Hierarchical Classification of Protein
    Domain Structures
  • SCOP Familial and Structural Protein
    Relationships
  • FSSP Protein Fold Classification Based on
    Structure--Structure Alignment

42
PDB Experimental 3D Structure Repository
Rat gamma-crystallin (chain A, B.)
Can you do a text search at PIR to find this
(CRGE_RAT)?
(http//www.rcsb.org/pdb/)
Lab
43
PDBsum
Pictorial Database to Provide Summary and
Analysis to PDB Entries
Search
3-D structure summary
2-D structure summary
(http//www.ebi.ac.uk/thornton-srv/databases/pdbsu
m/)
44
Protein Structural Classification (1)
CATH Hierarchical domain classification of
protein structures (http//www.cathdb.info/)
45
Protein Structural Classification (2)
SCOP comprehensive description of structural
and evolutionary relationships between all
proteins whose structure is known.
(http//scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.ht
ml)
46
SWISS-MODEL Repository
http//swissmodel.expasy.org/
http//swissmodel.expasy.org/repository/
A database of annotated three-dimensional
comparative protein structure models
(http//swissmodel.expasy.org/repository/smr.php?s
ptr_acCRBA1_MOUSEjob2)
47
VI. Proteomic Resources
  • GELBANK (http//gelbank.anl.gov) 2D-gel patterns
    of species with completed genomes.
  • SWISS-2DPAGE (http//www.expasy.org/ch2d/) index
    of 2D-gels
  • PEP (http//cubic.bioc.columbia.edu/ pep/)
    Predictions for Entire Proteomes summarized
    analyses of protein sequences
  • Integr8 (http//www.ebi.ac.uk/integr8/) A
    browser for information relating to completed
    genomes and proteomes, based on data contained in
    Genome Reviews and the UniProt proteome sets
  • PRIDE (http//www.ebi.ac.uk/pride/) PRoteomics
    IDEntifications database Expression Profiling
    databases
  • GPMdb (http//gpmdb.thegpm.org/) Mass spec
    proteomics Databases
  • PeptideAtlas (http//www.peptideatlas.org/)
    compendium of peptides identified in a large set
    of tandem mass spectrometry proteomic experiments
  • HUPO (http//www.hupo.org/) Human Proteome
    Organization to foste international proteomics
    initiatives.

48
2D-Gel Image Databases
Lab
(http//us.expasy.org/ch2d/)
Part of WORLD-2DPAGE index to 2-D PAGE databases
and services
(http//us.expasy.org/swiss-2dpage/acP02489)
49
GPMdb MS Data Search
(http//gpmdb.thegpm.org/)
Craig, et al., J Proteome Res. 2004, 31234-42.
50
PRIDE centralized, standards compliant, public
data repository for proteomics data
http//www.ebi.ac.uk/pride/
51
Lab
  • Text search / Information retrieval
  • Literature search and text mining
  • Finding synonyms (BioThesaurus)
  • Information extraction (e.g., protein
    phosphorylation sites)
  • Find the sequence for the rabbit alpha crystallin
    A chain
  • Find all alpha crystallin A chain classified in
    protein families
  • Search crystallins that have active enzyme
    activities
  • Find crystallins that have determined 3D
    structures
  • Database contents (reports)
  • Sequence genomics databases (UniProt)
  • Protein family databases (PIRSF)
  • Database of protein functions (KEGG)
  • Databases of protein structures (PDB)
  • Proteomics databases (Swiss-2D)
  • Protein Examples
  • Rabbit alpha crystallin A (UniProtKB
    CRYAA_RABIT/P02493)
  • Delta crystallin II (Argininosuccinate lyase)
    (UniProtKB ARLY2_ANAPL/P24058)
  • Any additional proteins of your interest for
    search and retrieval
Write a Comment
User Comments (0)
About PowerShow.com