Comparing Two Approaches of WDK Federation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Comparing Two Approaches of WDK Federation

Description:

Contr. Browser. CryptoDB. Redux, UGA. YYYY, UPenn. ToxoDB. ApiDB. Redux ... Contr. Browser. wdk.model.jspwrap. wdk.model: Question Query QueryInstaResultFactory ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 21
Provided by: sam120
Category:

less

Transcript and Presenter's Notes

Title: Comparing Two Approaches of WDK Federation


1
Comparing Two Approaches of WDK Federation
  • ----Database Link vs. WS

2
Background
We have three Apicomplexan organisms Plamso,
Crypto and Tox Question how can we provide an
uniform access way to end user
3
ApiDB
  • GUS(Genomics Unified Schema)all of them share
    the same database schema, accommodating
    information emerging from DNA sequencing and
    functional and comparative genomics projects
  • WDKthe front-end GUI for end user based on MVC
  • ApiDBWDK website based upon database federation
    on the same schema(GUS) (testing site)

4
Five Strategies for Federation
  • DB Link
  • Small centralized ApiDB with DB Links to
    component DBs
  • Web Services
  • No fixed centralized ApiDB with minimal
    centralized caching
  • JDBC
  • Analogous to WS except accessing component dbs
    through port 1521
  • Central ApiDB Data Warehouse
  • Automatic updates via streams
  • Manual updates via import/export or data pump
    scripts
  • Portlets
  • Simultaneous and convenient access to ApiDB and
    the component DBs

5
Similarities between DB Link, WS and JDBC
Approaches
  • Maintain much of the data only in the component
    DBs
  • Require no major modifications to WDK
  • Primary key definition needs to be modified in
    WDK

6
Differences between DB Link, WS and JDBC
Approaches
  • DB Link
  • need to copy coreprojectInfo table and CLOB data
    tables to ApiDB central site
  • synchronization problem (difficult to keep the
    central site data up-to-date)
  • Web Service
  • no centralized data (except for cache PK tables)
  • access DBs through WS
  • CLOB problem
  • JDBC
  • no centralized data (except for cache PK tables)
  • access DBs through JDBC (port 1521)
  • security problem (port 1521 open to certain IP
    addresses)

7
Portal As Future Work
  • A useful approach when user login is enabled
  • Apache JetSpeed2 currently evolving, will pursue
    the portal approach later
  • Can include portlets for ApiDB and each component
    DB respectively
  • The ApiDB portlet can itself be implemented with
    other federation strategies

  • for more information see Portals.ppt

8
Central Warehouse Lower Priority
  • Component DBs constantly growing, difficult to
    centralize all the data
  • Schemas must be used uniformly
  • Keeping the warehouse up-to-date is an issue
  • Not an option until weve explored all the other
    strategies

9
First Two Candidate Approaches
  • DB Link
  • Allows Oracle to handle data distribution,
    therefore all sorts of operations can be
    performed on the component data
  • Likely to perform well between Oracle DBs
  • Web Services
  • DB Link is an Oracle only solution otherwise, to
    connect to non-Oracle DBs, we need Transparent
    Gateway, so we want more platform-independent
    solution

10
Pros and Cons
11
DB Link Strategy
Tomcat
1
6
Changes to WDK
jspwrap
Model
Implementation

Show model.xml
SqlUtils
2
5
naimp
naimp
naimp
aaimp
aaimp
aaimp
Redux, UGA
3
cryptodots.nasequenceimp
3
3
4
4
4
Distributed Query Using DB link
Currently an extra table (app.srt) needs to be
streamed up from CryptoDB On redux for initial
testing
12
WS Strategy
Tomcat
wdk.model.jspwrap
2
5
wdk.model Question ? Query ?QueryInstance ?
ResultFactory

wdk.model.implementation Oracle SqlUtils
WsUtils
WSDL
1
4
4
1
4
1
JDBC
3
3
SOAP
13
Federation Screenshot
14
Realistic Model
  • Four common questions between Crypto and Plasmo
  • Finding genes by gene ID
  • Finding genes by Annotated Keyword
  • Finding genes by pfam domain
  • Finding genes by Exon Count
  • Deal with differences between Crypto model and
    Plasmo model

15
Cryptos Organism--sqlQuery
  • lt!-- VocabQueries.Organisms --gt
  • ltsqlQuery name"Organisms"gt
  • ltcolumn name"internal"/gt
  • ltcolumn name"term"/gt
  • ltcolumn name"idx"/gt
  • ltsqlgt
  • SELECT FROM (
  • SELECT 'C. hominis' as term, 3 as idx,
  • '''' 'Cryptosporidium hominis' '''' as
    internal
  • FROM dual
  • UNION
  • SELECT 'C. parvum' as term, 2 as idx,
  • '''' 'Cryptosporidium parvum' '''' as
    internal
  • FROM dual
  • UNION
  • SELECT 'C. parvum and C. hominis' as
    term, 1 as idx,
  • '''' 'Cryptosporidium hominis' ''',
  • ''' 'Cryptosporidium parvum' ''''
    as internal
  • FROM dual

16
Plasmos Organism--sqlQuery
  • ltsqlQuery name"Organism"gt
  • ltcolumn name"internal"/gt
  • ltcolumn name"term"/gt
  • ltcolumn name"idx"/gt
  • ltsqlgt
  • SELECT name AS term, taxon_id AS
    internal, rownum AS idx
  • FROM sres.TaxonName
  • WHERE name_class'scientific name'
  • AND taxon_id in (SELECT taxon_id
    FROM dots.ExternalNaSequence)
  • UNION
  • SELECT 'any' as term, -1 as internal,
  • count() 1 as idx
  • FROM sres.TaxonName
  • WHERE name_class'scientific name'
  • AND taxon_id in (SELECT taxon_id
    FROM dots.ExternalNaSequence)
  • lt/sqlgt
  • lt/sqlQuerygt

17
  • The End
  • Thank You!

18
Product Attribute Query
  • (select u.product as product
  • from (
  • SELECT r.source_id,
  • r.product
  • FROM dots.rnatype_at_CRYPTOA_DBLINK_at_ r
  • UNION
  • SELECT t.source_id,
  • t.product
  • FROM dots.transcript_at_CRYPTOA_DBLIN
    K_at_ t
  • ) u, dots.genefeature_at_CRYPTOA_DBLI
    NK_at_ g
  • WHERE g.source_id u.source_id()
  • AND g.source_id 'primaryKey'
  • AND g.row_project_id projectId)
  • UNION
  • (SELECT gf.product
  • FROM dots.GeneFeature_at_PLASMO_DBLIN
    K_at_ gf
  • WHERE gf.source_id'primaryKey
    '
  • AND gf.row_project_id
    projectId)

19
Organism Attribute Query
  • (SELECT
  • SUBSTR(tn.name, 1, 1) '.'
    SUBSTR(tn.name, INSTR(tn.name, ' ', 1, 1) 1) as
    organism
  • FROM dots.genefeature_at_CRYPTOA_DBLINK_at_
    g,
  • dots.externalnasequence_at_CRYPTOA_DBLINK
    _at_ enas,
  • dots.nalocation_at_CRYPTOA_DBLINK_at_ nal,
  • dots.source_at_CRYPTOA_DBLINK_at_ src,
    sres.taxonname_at_CRYPTOA_DBLINK_at_ tn
  • WHERE g.na_sequence_id
    enas.na_sequence_id
  • AND enas.na_sequence_id src.na_sequence_id
  • AND nal.na_feature_id g.na_feature_id
  • AND enas.taxon_id tn.taxon_id
  • AND g.source_id 'primaryKey'
  • AND g.row_project_id projectId)
  • UNION
  • ( SELECT SUBSTR(tn.name, 1, 1) '.
    ' SUBSTR(tn.name, INSTR(tn.name, ' ', 1, 1)
    1)
  • as organism
  • FROM dots.GeneFeature_at_PLASMO_DBLINK
    _at_ gf,
  • dots.NaSequence_at_PLASMO_DBLINK_at_
    s,
  • sres.TaxonName_at_PLASMO_DBLINK_at_
    tn
  • WHERE gf.na_sequence_id
    s.na_sequence_id

20
Sequence Attribute Query
  • (SELECT sequence
  • FROM DoTS.GeneFeature_at_CRYPTOA_DB
    LINK_at_ g,
  • _at_CRYPTOA_SCHEMA_at_SRT s
  • WHERE g.source_id
    'primaryKey'
  • AND g.source_id s.sequence_id()
  • AND g.row_project_id
    projectId)
  • UNION all
  • (SELECT ens.sequence
  • FROM dots.GeneFeature_at_PLASMO_DBLIN
    K_at_ gf,
  • _at_PLASMO_SCHEMA_at_ExternalNaSequence
    ens
  • WHERE gf.source_id'primaryKey
    '
  • AND gf.na_sequence_id
    ens.na_sequence_id
  • AND gf.row_project_id
    projectId)
Write a Comment
User Comments (0)
About PowerShow.com