Title: PIRSF Classification System
1PIRSF Classification System
Protein Classification and Functional Annotation
Discovery of New Knowledge by Using Information
Embedded within Families of Homologous Sequences
and Their Structures
- PIRSF Evolutionary relationships of proteins
from super- to sub-families - Homeomorphic Family Homologous proteins sharing
full-length similarity and common domain
architecture - Significance
- Improve sensitivity of protein identification and
functional inference - Detect and correct genome annotation errors
systematically - Provide basis for evolutionary and comparative
genomics research - Provide basis for automated annotation of protein
features annotate generic biochemical and
specific biological functions
2A protein may be assigned to only one
homeomorphic family, which may have zero or more
child nodes and zero or more parent nodes. Each
homeomorphic family may have as many domain
superfamily parents as its members have domains.
3Creation and Curation of PIRSFs
- Computer-Generated (Uncurated) Clusters
- Preliminary Curation
- Membership
- Signature Domains
- Full Curation
- Family Name, Description, Bibliography
- PIRSF Name Rules
4PIRSF family classification system
http//pir.georgetown.edu/pirwww/dbinfo/pirsf.shtm
l
5PIRSF Text Search
Ways to get to PIRSF text search
Add extra input boxes for advanced search
Select field
6PIRSF Text Search Result (I)
- Things you can do from the result table
- Add search terms or start search over
2. Customize the table columns
3. Save your results as table or FASTA format
4. Select entries using check boxes and perform
analysis using tool bar options
5. Links to PIRSF records, PIRSF hierarchy, to
protein domains (Pfam)
1
2
3
4
5
7PIRSF Text Search Result (II)
2. How to customize the table columns Display
KEGG pathway ID column
a- Select KEGGPathway ID in the Fields not in
display box
c- Now KEGG ID should be in the Fields in
display. Press apply button for the changes to
take place.
8PIRSF Text Search Result (III)
3. Save your results as table or FASTA format
a- Select Entries using check boxes in the PIRSF
column. To select all, check the box in the
column heading.
9PIRSF Text Search Result (IV)
4. Select entries using checkboxes and perform
analysis using tool bar options
a- Select families using check boxes in the PIRSF
ID column. To select all, check the box in the
column heading. Then select tool, e.g., Taxonomy
Distribution
Display taxonomic distribution for the selected
families. In this case, PIRSF001501 and
PIRSF017318 contain members of the AroQ class
from prokaryotes and eukaryotes, respectively,
which is also reflected in the family name.
10PIRSF Text Search Result (V)
- Note on selecting families for analysis for
Multiple Alignment and Domain Display
- If more than one family is selected the chosen
tool will perform the operation on representative
members of the selected families. Example
multiple alignment PIRSF001501, PIRSF500251,
PIRSF026640 and PIRSF029775.
- If one family is selected the chosen tool will
perform the operation on the seed members.
Example multiple alignment PIRSF001501
11PIRSF Text Search Result (VI)
5. The result table contains summarized
information about family size, domain
architecture, level of curation. Additional data
can be viewed by using the Display Option.
PIRSF Name The names assigned to PIRSF
predominantly reflect the membership. The main
source of PIRSF names is the literature. Fully
curated families have a name accompanied, in most
cases, by an evidence tag Validated to
indicate that at least one member in the family
has experimentally determined function.
Predicted for families whose functions are
inferred computationally based on sequence
similarity and/or functional associative
analysis. Tentative cases where experimental
evidence is not decisive.
Curation Status Indicates the level of manual
curation of the PIRSF. Uncurated
Computer-generated protein clusters, no manual
curation. The clusters are computationally
defined using both pairwise based parameters (
sequence identity, sequence length ratio and
overlap length ratio) and cluster-based
parameters ( matched members, distance to
neighboring clusters and overall domain
arrangement).Preliminary Computer-generated
clusters are manually curated for membership (do
proteins belong to the assigned cluster?) and
domain architecture (Pfam domains listed from N-
to C- termini). Full/Full (with description) A
name is assigned to the protein family, and
accompanying references are listed when
available. In many cases, brief descriptions are
also provided.
Hfam/Superfam/Subfam Indicates the hierarchical
level for the PIRSF homeomorphic, superfamily or
subfamily level, respectively. Selecting the
button will show the PIRSF hierarchy in a DAG
view with Pfam as the top node.
125. PIRSF hierarchy in DAG view (cont.)
Pfam level
Hfam level
Subfam level
13PIRSF Family Report (I) Curated Protein Family
Information
Level of manual curation
14PIRSF Family Report (II)
Integrated value-added information from other
databases
Mapping to other protein classification databases
15PIRSF Batch Retrieval
Retrieve PIRSF families by selecting a specific
identifier or a combination of identifiers.
Define IDs
Display the list of query/PIRSF matches
List IDs
16PIRSF SCAN (sequence search)
17PIRSF SCAN (sequence search)
Returns only matches to fully curated PIRSFs
UniProtKB sequence Q8Y5X7 is automatically
classified as chorismate mutase of the AroH
class PIRSF005965