Comparing Two Approaches of WDK Federation - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Comparing Two Approaches of WDK Federation

Description:

Contr. Browser. CryptoDB. Redux, UGA. YYYY, UPenn. ToxoDB. ApiDB. Redux ... Contr. Browser. wdk.model.jspwrap. wdk.model: Question Query QueryInstaResultFactory ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 21

Provided by: sam120

Category:

more less

Transcript and Presenter's Notes

Title: Comparing Two Approaches of WDK Federation

1
Comparing Two Approaches of WDK Federation

----Database Link vs. WS

2
Background
We have three Apicomplexan organisms Plamso,
Crypto and Tox Question how can we provide an
uniform access way to end user
3
ApiDB

GUS(Genomics Unified Schema)all of them share
the same database schema, accommodating
information emerging from DNA sequencing and
functional and comparative genomics projects
WDKthe front-end GUI for end user based on MVC
ApiDBWDK website based upon database federation
on the same schema(GUS) (testing site)

4
Five Strategies for Federation

DB Link
Small centralized ApiDB with DB Links to
component DBs
Web Services
No fixed centralized ApiDB with minimal
centralized caching
JDBC
Analogous to WS except accessing component dbs
through port 1521
Central ApiDB Data Warehouse
Automatic updates via streams
Manual updates via import/export or data pump
scripts
Portlets
Simultaneous and convenient access to ApiDB and
the component DBs

5
Similarities between DB Link, WS and JDBC
Approaches

Maintain much of the data only in the component
DBs
Require no major modifications to WDK
Primary key definition needs to be modified in
WDK

6
Differences between DB Link, WS and JDBC
Approaches

DB Link
need to copy coreprojectInfo table and CLOB data
tables to ApiDB central site
synchronization problem (difficult to keep the
central site data up-to-date)
Web Service
no centralized data (except for cache PK tables)
access DBs through WS
CLOB problem
JDBC
no centralized data (except for cache PK tables)
access DBs through JDBC (port 1521)
security problem (port 1521 open to certain IP
addresses)

7
Portal As Future Work

A useful approach when user login is enabled
Apache JetSpeed2 currently evolving, will pursue
the portal approach later
Can include portlets for ApiDB and each component
DB respectively
The ApiDB portlet can itself be implemented with
other federation strategies
for more information see Portals.ppt

8
Central Warehouse Lower Priority

Component DBs constantly growing, difficult to
centralize all the data
Schemas must be used uniformly
Keeping the warehouse up-to-date is an issue
Not an option until weve explored all the other
strategies

9
First Two Candidate Approaches

DB Link
Allows Oracle to handle data distribution,
therefore all sorts of operations can be
performed on the component data
Likely to perform well between Oracle DBs
Web Services
DB Link is an Oracle only solution otherwise, to
connect to non-Oracle DBs, we need Transparent
Gateway, so we want more platform-independent
solution

10
Pros and Cons
11
DB Link Strategy
Tomcat
1
6
Changes to WDK
jspwrap
Model
Implementation

Show model.xml
SqlUtils
2
5
naimp
naimp
naimp
aaimp
aaimp
aaimp
Redux, UGA
3
cryptodots.nasequenceimp
3
3
4
4
4
Distributed Query Using DB link
Currently an extra table (app.srt) needs to be
streamed up from CryptoDB On redux for initial
testing
12
WS Strategy
Tomcat
wdk.model.jspwrap
2
5
wdk.model Question ? Query ?QueryInstance ?
ResultFactory

wdk.model.implementation Oracle SqlUtils
WsUtils
WSDL
1
4
4
1
4
1
JDBC
3
3
SOAP
13
Federation Screenshot
14
Realistic Model

Four common questions between Crypto and Plasmo
Finding genes by gene ID
Finding genes by Annotated Keyword
Finding genes by pfam domain
Finding genes by Exon Count
Deal with differences between Crypto model and
Plasmo model

15
Cryptos Organism--sqlQuery

lt!-- VocabQueries.Organisms --gt
ltsqlQuery name"Organisms"gt
ltcolumn name"internal"/gt
ltcolumn name"term"/gt
ltcolumn name"idx"/gt
ltsqlgt
SELECT FROM (
SELECT 'C. hominis' as term, 3 as idx,
'''' 'Cryptosporidium hominis' '''' as
internal
FROM dual
UNION
SELECT 'C. parvum' as term, 2 as idx,
'''' 'Cryptosporidium parvum' '''' as
internal
FROM dual
UNION
SELECT 'C. parvum and C. hominis' as
term, 1 as idx,
'''' 'Cryptosporidium hominis' ''',
''' 'Cryptosporidium parvum' ''''
as internal
FROM dual

16
Plasmos Organism--sqlQuery

ltsqlQuery name"Organism"gt
ltcolumn name"internal"/gt
ltcolumn name"term"/gt
ltcolumn name"idx"/gt
ltsqlgt
SELECT name AS term, taxon_id AS
internal, rownum AS idx
FROM sres.TaxonName
WHERE name_class'scientific name'
AND taxon_id in (SELECT taxon_id
FROM dots.ExternalNaSequence)
UNION
SELECT 'any' as term, -1 as internal,
count() 1 as idx
FROM sres.TaxonName
WHERE name_class'scientific name'
AND taxon_id in (SELECT taxon_id
FROM dots.ExternalNaSequence)
lt/sqlgt
lt/sqlQuerygt

The End
Thank You!

18
Product Attribute Query

(select u.product as product
from (
SELECT r.source_id,
r.product
FROM dots.rnatype_at_CRYPTOA_DBLINK_at_ r
UNION
SELECT t.source_id,
t.product
FROM dots.transcript_at_CRYPTOA_DBLIN
K_at_ t
) u, dots.genefeature_at_CRYPTOA_DBLI
NK_at_ g
WHERE g.source_id u.source_id()
AND g.source_id 'primaryKey'
AND g.row_project_id projectId)
UNION
(SELECT gf.product
FROM dots.GeneFeature_at_PLASMO_DBLIN
K_at_ gf
WHERE gf.source_id'primaryKey
'
AND gf.row_project_id
projectId)

19
Organism Attribute Query

(SELECT
SUBSTR(tn.name, 1, 1) '.'
SUBSTR(tn.name, INSTR(tn.name, ' ', 1, 1) 1) as
organism
FROM dots.genefeature_at_CRYPTOA_DBLINK_at_
g,
dots.externalnasequence_at_CRYPTOA_DBLINK
_at_ enas,
dots.nalocation_at_CRYPTOA_DBLINK_at_ nal,
dots.source_at_CRYPTOA_DBLINK_at_ src,
sres.taxonname_at_CRYPTOA_DBLINK_at_ tn
WHERE g.na_sequence_id
enas.na_sequence_id
AND enas.na_sequence_id src.na_sequence_id
AND nal.na_feature_id g.na_feature_id
AND enas.taxon_id tn.taxon_id
AND g.source_id 'primaryKey'
AND g.row_project_id projectId)
UNION
( SELECT SUBSTR(tn.name, 1, 1) '.
' SUBSTR(tn.name, INSTR(tn.name, ' ', 1, 1)
1)
as organism
FROM dots.GeneFeature_at_PLASMO_DBLINK
_at_ gf,
dots.NaSequence_at_PLASMO_DBLINK_at_
s,
sres.TaxonName_at_PLASMO_DBLINK_at_
tn
WHERE gf.na_sequence_id
s.na_sequence_id