Title: October 10, 2003
1Demo Protein Information Resource
- October 10, 2003
- NIH Proteomics Workshop
- Bethesda, MD
- Raja Mazumder, Ph.D.
- Scientific Coordinator and Senior Protein
Scientist, PIR
2Database Demo
- NREF Database
- http//pir.georgetown.edu/pirwww/search/pirnref.sh
tml - NREF Entry (NF00091113)
- iProClass Database
- http//pir.georgetown.edu/iproclass/
- iProClass Sequence (A58910), Motif (PCM00487)
- PIR-PSD Database
- http//pir.georgetown.edu/pirwww/search/textpsd.sh
tml - PIR Entry (A58910)
- Other Molecular Databases
- Function KEGG Enzyme (EC 1.1.1.205), KEGG
Pathway (MAP00230) BRENDA (EC 1.1.1.205) - Structure PDB (1AK5), SCOP (Alanine Racemase),
CATH (1AK5) - Domain Pfam (PF00478), CDD (HemL)
- Classification COGs (COG0001)
3PIR Web Site (http//pir.georgetown.edu)
4Text Search Result
5Text Search Result with NULL/NOT NULL
6Peptide Search Results
7PIR-NREF Search Result (I)
Test Sequence ftp//nbrfa.georgetown.edu/pir/mis
c/test.seq
8PIR-NREF Search Result (II)
9HMM Domain/Motif Search
10PIR Pattern Search
11PIR Pattern Search Result (I)
- http//pir.georgetown.edu/pirwww/search/patmatch.h
tml
Pattern Match Sequence vs. PROSITE
12PIR Pattern Search Result (II)
- Search a query pattern against a sequence
database.
13PIR Domain Display
14PIR-NREF Database (http//pir.georgetown.edu/pirww
w/search/pirnref.shtml)
.
search
15PIR-NREF Report
16Related Sequences
17PIR-iProClass Database
18iProClass Sequence Report
19PDB Structure of Molecule Inosine-5'-Monophospha
te Dehydrogenase
20Development of protein sequence databases
- Atlas of protein sequence and structure Dayhoff
(1966) first sequence database (pre-bioinformatics
). Currently known as Protein Information
Resource (PIR) - Protein data bank (PDB) structural database
(1972) remains most widely used database of
structures - SWISSPROT protein sequence database (1987)
still in use not exhaustive but heavily
annotated - UniProt The United Protein Databases (UniProt,
2003) will create a central database of protein
sequence and function by joining the forces of
the SWISS-PROT, TrEMBL and PIR protein database
activities
21Protein Family Classification
Discovery of New Knowledge by Using Information
Embedded within Families of Homologous Sequences
and Their Structures
- Superfamily and Domain Classification
- Superfamily Concept
- End-to-End Similarity Same Overall Domain
Architecture - Significance
- Improve Sensitivity of Protein Identification
- Provide Complete Clustering for Database
Organization - Detect and Correct Genome Annotation Errors
Systematically - Drive Other Annotations
- Stimulate Evolution, Genomics and Proteomics
Research
22Protein Family/Superfamily Definitions
- Family
- A Set of Protein Sequences That Share a Common
Evolutionary Ancestor with End-to-End Sequence
Similarity (No Major Discrepancy by Standard
Multiple Alignment Methods) - Have the Same Domain Architecture (Except
Incomplete or Alternately Spliced) - Overall Sequence Identity ?
- Superfamily
- A Set of Protein Families That Share a Common
Evolutionary Ancestor From End-to-end - Have the Same Domain Architecture
- Overall Sequence Identity ?
- Best-hit rule
23Protein Domain Definition
- Domain
- Domains can be described as discrete structurally
conserved units in proteins that are evolutionary
mobile - They typically correspond to discrete globular
folding units in the structure of a protein and
may often occur independently of other domains in
the protein - A Recognizable Region of Similarity
- Have a Common Ancestry
- Found in Diverse Protein Sequences (in gt 2
Superfamilies) - A Sequence Can Belong to Only One Protein Family
and Superfamily, but May Contain More Than One
Domains.
24Network structure of protein classification
P-loop NTPase (Structural fold) P-loop NTPase (Structural fold) P-loop NTPase (Structural fold)
Domain superfamilies Domain superfamilies Domain superfamilies
AAA ATPases DNA pumping ATPases RecA/SF1/SF2 helicase lineage
Homeomorphic families Homeomorphic families Homeomorphic families
Replicative DNA helicase ATPase Nucleic acid helicase
VACa-D5Rb MCV-MC094R SFV-gp080R FPV-FPV058 MSV-MSV089 AMV-AMV087 VAC-A32L MCV-MC140L SFV-gp120L FPV-FPV197 MSV-MSV171 AMV-AMV150 VAC-A18R MCV-MC123R SFV-gp108R FPV-FPV183 MSV-MSV148 AMV-AMV059
25Network structure of protein classification
26Superfamily-Domain-Motif Relationship
27iProClass Superfamily List
- All Superfamilies Containing PF00001
28iProClass Superfamily Report
29Alignment and Tree View
30PIR-Protein Sequence Database
31PIR-PSD Entry
32BLAST/FASTA Search
33PIR FASTA Search Result
34PIR Searches and Alignment
BLAST Search
35PIR Hidden Markov Model
- http//pir.georgetown.edu/pirwww/search/pirhmm.htm
l - HMM Model Building Sequence Search
- One Protein Against All HMMs
- All Proteins Against One HMM
36Bibliography Submission System
37PIR Bibliography Submission
- View Bibliography Information
- View Protein Entry
- Submit Citation with Optional Categorization
38PIR Bibliography Submission
39Bibliography Information Display (I)
- From PIR-NREF
- From Other Curated Database
40Bibliography Information Display (II)
- From User Submission
- From Computer-Mapping (e.g. Gene Symbol)
41Proteomic Bioinformatics
- Large-Scale Analysis of Proteomic Data Homology
Search for Pathways
42PIR Batch retrieval
43PIR Batch Retrieval Results
44Pairwise Alignments
45PIR Pairwise Alignment
46Composition Molecular Weight Calculation
47Composition Molecular Weight Calculation
48PIR support center