Anastasia Nikolskaya - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Anastasia Nikolskaya

Description:

Evolution by recent duplication and loss. Origin traceable to a single gene in LCA ... Creation and curation of PIRSFs. UniProt proteins. Preliminary ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 17
Provided by: wuc
Category:

less

Transcript and Presenter's Notes

Title: Anastasia Nikolskaya


1

COMPLEMENTING GENE ONTOLOGY WITH PIRSF
CLASSIFICATION-BASED PROTEIN ONTOLOGY
  • Anastasia Nikolskaya
  • PIR (Protein Information Resource)
  • Georgetown University Medical Center
  • www.uniprot.org http//pir.georgetown.edu/

2
Why Protein Classification?
  • Automatic annotation of protein sequences based
    on protein families (propagation of annotation)
  • Systematic correction of annotation errors
  • Protein name standardization in UniProt
  • Functional predictions for uncharacterized
    protein families

3
PIRSF Classification System
  • PIRSF A network structure with hierarchies from
    Superfamilies to Subfamilies reflects
    evolutionary relationships of full-length
    proteins
  • Definitions
  • Basic unit Homeomorphic Family
  • Homologous (Common Ancestry) Inferred by
    sequence similarity
  • Homeomorphic Full-length sequence similarity and
    common domain architecture
  • Network Structure Flexible number of levels with
    varying degrees of sequence conservation
  • Advantages
  • Annotation of both generic biochemical and
    specific biological functions
  • Accurate propagation of annotation and
    development of standardized protein nomenclature
    and ontology

4
Levels of protein classification
5
PIRSF Classification System
A protein may be assigned to only one
homeomorphic family, which may have zero or more
child nodes and zero or more parent nodes. Each
homeomorphic family may have as many domain
superfamily parents as its members have domains.
6
PIRSF Classification System
A protein may be assigned to only one
homeomorphic family, which may have zero or more
child nodes and zero or more parent nodes. Each
homeomorphic family may have as many domain
superfamily parents as its members have domains.
SF500001 stimulates trophoblast
migration SF500002 stimulates proliferation of
prostate cancer cells SF500003
anti-proliferative and pro-apoptotic effects on
cancer cells SF500004 inhibitor of IGF
SF500005 stimulates bone formation SF500006
inhibitor of IGF-II
7
Creation and curation of PIRSFs
UniProt proteins
New proteins
Automatic Procedure
Unassigned proteins
  • Computer-Generated (Uncurated) Clusters (36,000
    PIRSFs)
  • Preliminary Curation (5,000 PIRSFs)
  • Membership
  • Signature Domains
  • Full Curation (1,300 PIRSFs)
  • Family Name with evidence tag
  • Description, Bibliography

Automatic clustering
Preliminary Homeomorphic Families
Orphans
Map domains on Families
Automatic placement
Merge/split clusters
Add/remove members
Computer-assisted Manual Curation
Curated Homeomorphic Families
Name, refs, abstract, domain arch.
Final Homeomorphic Families
Protein name rule/site rule
Create hierarchies (superfamilies/subfamilies)
Build and test HMMs
8
PIRSF-Based Protein Annotation in UniProt
UniProt is developing protein name standards and
guidelines Classification of proteins into
families provides a convenient and accurate
mechanism to propagate curated information to
individual protein members
  • Rule-Based annotation system using curated
    PIRSFs
  • Site Rules (PIRSR) Position-Specific Site
    Features (active sites, binding sites, modified
    sites, other functional sites)
  • Name Rules (PIRNR) transfer name from PIRSF to
    individual proteins (define a subgroup if
    necessary)
  • Protein Name (may differ from family name),
    synonyms, acronyms
  • EC
  • Misnomers
  • GO Terms (homeomorphic family-based, propagatable
    GO annotation)
  • Function

9
PIRSF-Based Protein Ontology
  • PIRSF family hierarchy is based on evolutionary
    relationships
  • Standardized PIRSF family names
  • Network structure (in DAG) for PIRSF family
    classification system

10
PIRSF to GO Mapping
  • PIRSF to GO mapping provides a link between GO
    concepts and protein objects
  • Mapped 5500 curated PIRSF homeomorphic families
    and subfamilies to the GO hierarchy

DynGO viewerHongfang Liu , University of Maryland
  • Superimpose GO and PIRSF hierarchies
  • Bidirectional display (GO-centric or
    PIRSF-centric views)

11
Protein Ontology Can Complement GO
  • Expanding a Node
  • Identification of GO subtrees that need expansion
    if GO concepts are too broad
  • 67 of curated PIRSF families and subfamilies
    map to GO leaf nodes
  • Among these, 2209 PIRSFs have shared GO leaf
    nodes (many PIRSFs to 1 GO leaf)
  • Example PIRSF001969 vs PIRSF018239 and
    PIRSF036495 High- vs low-affinity IGF binding
  • Identification of missing GO nodes

12
Protein Ontology Can Complement GO
Identification of Missing GO Nodes (higher
levels)
13
Protein Ontology Can Complement GO
  • Linking Function, Biological Process, and
    Cellular Component through a Protein Object Based
    on Protein Annotations
  • Mechanism to examine the relationships between
    the three GO ontologies based on the shared
    annotations at different protein family levels
  • Example molecular function estrogen receptor
    activity and biological process signal
    transduction ,estrogen receptor signaling
    pathway

14
PIRSF Protein Classification a link between GO
and protein objects
  • Annotation Quality
  • Annotation of biological function of whole
    proteins
  • Annotation of uncharacterized hypothetical
    proteins
  • Correction of annotation errors and
    underannotations
  • Standardization of Protein Names
  • PIRSF to GO mapping provides a link between GO
    sub-ontologies and protein objects

15
PIRSF-based Protein Ontology Can Complement GO
  • Identification of GO subtrees that need expansion
    if GO concepts are too broad
  • Comprehensive classification of related protein
    families in PIRSF can help in identification of
    missing GO nodes when entire groups of PIRSF
    superfamilies or families cannot be mapped to
    existing GO terms
  • Mechanism to examine the relationships between
    the three GO ontologies (molecular function,
    biological process, and cellular component), as
    well as between GO concepts, based on the shared
    annotations at different protein family levels

16
Acknowledgements
  • Hongfang Liu , University of Maryland
  • Judith Blake, The Jackson Laboratory
  • Dr. Cathy Wu, Director
  • Protein Classification team
  • Dr. Winona Barker Dr. Lai-Su Yeh Dr.
    Anastasia Nikolskaya
  • Dr. Darren Natale Dr. Zhangzhi Hu Dr. Raja
    Mazumder
  • Dr. CR Vinayaka Dr. Xianying Wei Dr. Sona
    Vasudevan
  • Informatics team
  • Dr. Hongzhan Huang Baris Suzek, M.S.
    Sehee Chung, M.S.
  • Dr. Leslie Arminski Dr. Hsing-Kuo Hua
    Yongxing Chen, M.S.
  • Jing Zhang, M.S. Amar Kalelkar
  • Students
  • Christina Fang Vincent Hormoso Natalia
    Petrova Jorge Castro-Alvear

http//pir.georgetown.edu/
PIR Team
UniProt (SwissProt, TrEMBL, PIR)
www.uniprot.org
Write a Comment
User Comments (0)
About PowerShow.com