Creation and Maintenance of GeneKeyDB - PowerPoint PPT Presentation

About This Presentation
Title:

Creation and Maintenance of GeneKeyDB

Description:

Creation of APIs to validate data in the database and to enable querying to ... APIs made that will be written in Perl. Perl is used often, almost exclusively, ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 15
Provided by: kevink53
Learn more at: https://www.baylor.edu
Category:

less

Transcript and Presenter's Notes

Title: Creation and Maintenance of GeneKeyDB


1
Creation and Maintenance of GeneKeyDB
  • Research being conducted by
  • Kevin Kastner
  • Under the direction of
  • Dr. Erich Baker

2
The Problem
  • There exists thousands of biomedical data
    sources.
  • In 2006, there were 557 relevant public
    resources in molecular biology.
  • This is growing rapidly.
  • 203 sources in 1999
  • 226 sources in 2000
  • 277 sources in 2001.

3
The Problem
  • Traditional database approaches are too
    structured.
  • Scientific objects change identification over
    time.
  • Gene names change over time.
  • The Human Genome Nomenclature Database (HUGO)
    contains 13,594 active symbols, 9635 literature
    aliases, and 2739 withdrawn symbols.
  • SIR2L1 (w/drawn) is a synonym for SIRT1 and
    sir2-like 1.

4
Scientific Object Identities
5
The Solution
  • GeneKeyDB
  • A gene-centered relational database developed to
    enhance data mining in biological data sets.
  • GeneKeyDB relies primarily on existing database
    identifiers derived from community databases
    (NCBI, GO, Ensembl, et al.) as well as the known
    relationships among those identifiers.
  • Version 1 is already out!
  • http//www.biomedcentral.com/1471-2105/6/72

6
Weaknesses of Version 1
  • Can no longer be updated
  • Complex queries must be made to the database in
    order to obtain desired information

7
(No Transcript)
8
Complex Queries
  • SELECT ll_xp_cdd.cdd_name, ll_np_cdd.cdd_name,
    organism
  • FROM ll_xp_cdd, ll_np_cdd, ll_locus
  • WHERE ll_xp_cdd.cdd_score ll_np_cdd.cdd_score
  • AND ll_id IN
  • (SELECT ll_id
  • FROM ll_refseq_xm
  • WHERE ll_refseq_xm_id IN
  • (SELECT ll_refseq_xm_id
  • FROM ll_xp_cdd, ll_np_cdd
  • WHERE ll_xp_cdd.cdd_score ll_np_cdd.cdd_score
    ))
  • AND ll_id IN
  • (SELECT ll_id
  • FROM ll_refseq_nm
  • WHERE ll_refseq_nm_id IN
  • (SELECT ll_refseq_nm_id
  • FROM ll_xp_cdd, ll_np_cdd
  • WHERE ll_xp_cdd.cdd_score ll_np_cdd.cdd_score)
    )

9
Current Research
  • Creation of APIs to validate data in the database
    and to enable querying to become much easier for
    the user.
  • One-step updating of the database and the
    information it contains.

10
API Alternative
  • // fxn(search_params, desired_info), returns
    ll_id
  • curated.cdd(score ,null)
  • curated_score ? score
  • locus_id1 ? gaa.cdd((name ,score ), score
    )
  • gaa_name ? name
  • gaa_score ? score
  • locus_id2 ? curated.cdd(name ,score )
  • curated_name ? name
  • locus_id ? intersect(locus_id1 ,locus_id2
    )
  • locus(organism , locus_id )
  • print(gaa_name , curated_name , organism )

11
External Implementations
  • Some databases have APIs as well.
  • Ensembl
  • APIs are done in Perl.
  • APIs for GeneKeyDB will be done in Java.
  • More structured language.
  • Easier to read.

12
The Future of GeneKeyDB
  • GeneKeyDB will join even more external and widely
    used databases together.
  • Code for updating GeneKeyDB will tie into
    database information that will change in expected
    ways.
  • Lowers the required number of code rewrites.
  • GeneKeyDB will be dynamically updated.

13
The Future of GeneKeyDB
  • APIs made that will be written in Perl.
  • Perl is used often, almost exclusively, by
    biologists.
  • Can have Perl APIs tie into Java APIs, rather
    than creating all new ones.

14
Comments? Questions?
  • http//genereg.ornl.gov/gkdb/
Write a Comment
User Comments (0)
About PowerShow.com