EMISPHER 3rd EuroMEDiterranean Conference, - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

EMISPHER 3rd EuroMEDiterranean Conference,

Description:

location (database, scop,'jdbc:mysql://comas.soi.city.ac.uk','u','p' ... SCOP Superfamily(Domain1, Superfamily1), SCOP Superfamily(Domain2, Superfamily2) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 31
Provided by: Dimi46
Category:

less

Transcript and Presenter's Notes

Title: EMISPHER 3rd EuroMEDiterranean Conference,


1
  • EMISPHER 3rd Euro-MEDiterranean Conference,
  • Dimitrios Vogiatzis
  • University of Cyprus
  • 26/06/2004

2
Summary of talk
  • Brief presentation of bioGRID project
  • Focus on the computational GRID
  • Present the PROVA (distributed computations).
  • Early evaluation results

3
Scientific Objectives bioGRID project
  • BioGrid is a trial IST (FP 5) project, 2 year
    (Sep-02 to Sep-04) with the following objectives
  • Development and Integration of grid technologies
    so that
  • Researchers obtain an efficient information
    output
  • Three tools to be integrated
  • PSIMAP, protein interaction discovery
    visualisation
  • Space Explorer, gene protein visualisation
  • Classification Server, text data mining
  • Tools access Information resources
  • Databases (protein, gene expression).
  • Unstructured data (pubmed abstracts)
  • Software tools (TOPS, for protein structural
    comparison)

4
The Grid component of the project
  • Develop a grid technology,
  • for integration of protein interaction space,
    gene expression document space.
  • Speed up calculations
  • UCY team manager George Papadopoulos, Members
    Dimitrios Vogiatzis, Aristos Stavrou
  • Site www.bio-grid.net

5
Prova language
  • In the project, the language is used as a
    rules-based backbone for distributed running of
    PSIMAP
  • Prova is derived from Mandarax- Java-based
    inference system.
  • Prova extends Mandarax by providing a proper
    language syntax,
  • native syntax integration with Java,
  • agent messaging and reaction rules.

6
Design Goal of Prova
  • Marry the benefits of declarative and
    object-oriented programming
  • Combine the syntaxes of Prolog and Javaultimate
    logic and object-oriented languages
  • Expose logic and agent behaviour as rules
  • Access data sources via wrappers written in Java
    or command-line shells like Perl
  • Make all Java API from available packages
    directly accessible from rules
  • Run within the Java runtime
  • Enable rapid prototyping of applications
  • Offer a rule-based platform for distributed agent
    programming

7
Prova is useful for data integration tasks when
the following is important
  • Location transparency (local, remote, mirrors)
  • Format transparency (database, RDF, XML, HTML,
    flat files, computation resource)
  • Resilience to change (databases and web sites
    change often)
  • Use of open and open source technologies
  • Understandability and modifiability by a non-IT
    specialist
  • Economical knowledge representation
  • Extensibility with additional functionality

8
Examples, using PROVA ?
Given the open database connection DB and a
unique protein idendifier, from PDB, test
whether the provided domains with IDs PXA and
PXB interact (have at least 4 atoms within 5
angstroms) scop_dom2dom (DB, PDB_ID, PXA, PXB)
-- access_data(pdb, PDB_ID, Protein), access_do
m_atoms (DB, Protein, PXA, DomainA), access_dom_a
toms(DB, Protein, PXB, DomainB), DomainA.interact
s (DomainB).
9
Examples, using PROVA ??
Opening a database location (database,
scop,jdbcmysql//comas.soi.city.ac.uk,u,p).
location (database, scop,jdbcmysql//localhost
,u,p). dbopen(scop, DB). Querying a
database sql_select (DB, From, N1, V1, ., Nk,
Vk),
10
Agent based Prova I
  • messages via
  • Java Messaging System (JMS), message oriented
    middleware platform. Joram Implementation.
  • A sends message to B. Once B goes online the
    messages will be delivered
  • JADE-HTTP, minimal configuration requirements
    compared to JMS -? ad-hoc networks

11
Features of Prova-AA
  • Agent communication
  • sendMSG,
  • sendMsg(XID,Protocol,Agent,Performative,Predicate
    ArgsContext)
  • rcvMSG rules
  • rcvMSG(XID,Protocol,From.queryref,XXsContext

12
PROVA architecture
13
Distributed PSIMAP
  • PSIMAPdiscover possible protein interactions
  • Database contains 6120 multidomain proteins
  • PROVA 1.3,1.4, PSIMAP prepackaged with prova

14
PSIMAP
  • PSIMAP is the first complete protein structural
    domain interaction map
  • shows, what kinds of protein domains are found to
    be interacting structurally.
  • PSIMAP has specific shapes reflecting the types
    of protein domains their interaction partners

15
On protein interactions
  • Protein interactions provide an important context
    for the understanding of function
  • Get multidomain proteins (3000-30000 residues)
    12000 proteins and growing
  • Determine interaction
  • Check all residue pairs of any domain
  • Possible interaction
  • if number of residue pairs within a threshold
    (5Angstroms) is gt
  • 5 pairs

16
Protein interaction, computation
17
Algorithm
  • Interaction(Superfamliy1, Superfamily2) if
  • PDB(Protein),
  • Domain(Protein,Domain1),
  • Domain(Protein,Domain2),
  • SCOP Superfamily(Domain1, Superfamily1),
  • SCOP Superfamily(Domain2, Superfamily2),
  • InteractionDD(Domain1,Domain2, 5 Ang, 5 Residues)
  • Complexity is O(n log(n))

18
Purpose of Experiments
  • Discover significance of net delays
  • Manager, Worker plus in remote locations
  • Discover significance of management delays
  • Manager, worker plus situation
  • Come up with speed up.

19
Manager, workerplus plus local set-up
Manager
Worker-plus
Worker-plus
Worker-plus
local copy of Protein structs
  • Processing manager, workers
  • Management manager

20
Distributed Set up
Router
RUG
UCY
Worker plus
Worker plus
Worker plus
Manager
local copy of Protein structs
Worker plus
Worker plus
Router
CITY
local copy of Protein structs
Worker plus
Worker plus
local copy of Protein structs
21
Manager
location(database,scop,"jdbcmysql//xxx.xxx) st
art_psimap() - dbopen(scop,DB),
Psimappsimap.Psimap(DB), ListPsimap.divideSup
erTaskList(1500), assert(processor(Psimap)),
assert(tasks(List)), attach_routers(),
iam(Me), sendMsg(XID,self,Me,tell,ready()). a
ttach_routers() - router(Router),
sendMsg(XID,jade,Router,tell,attach()). attach_ro
uters().
rcvMsg(XID,Protocol,From,tell,ready()) -
tasks(List), TaskList.removeFirst(),
execute_task(XID,Protocol,From,Task). execute_tas
k(XID,self,Me,Task) - println("About to
execute ",Task), !, processor(Psimap),
spawn(Psimap,executeTask,Task),
rcvMsg (XID1,self,Me,return,complete(Psimap,execut
eTask,Task)), ResultPsimap.getTaskResult(),
sendMsg(XID,self,Me,reply,worker(Result,Task)),
sendMsg(XID,self,Me,tell,ready()). execute_t
ask(XID,jade,From,Task) - sendMsg(XID,jade,Fro
m,submit,worker(Result,Task)). rcvMsg(XID,Protoco
l,From,reply,worker(Result,Task)) -
store_result(From,Result,Task).
22
Worker
super("prova_at_SUPERNODE"). location(database,scop
,"jdbcmysql//localhost","guest","guestdb"). sta
rt_worker() - dbopen(scop,DB),
Psimappsimap.Psimap(DB),
assert(processor(Psimap)), super(Super),
sendMsg(XID,jade,Super,tell,ready()). work
er(Result,Task) - println("About to
execute ",Task), iam(Me),
processor(Psimap), spawn(Psimap,executeTas
k,Task),
rcvMsg(XID,self,Me,return,complete(Psimap,executeT
ask,Task)), ResultPsimap.getTaskResult(),
println(Result). Reaction rule to
submit rcvMsg(XID,Protocol,From,submit,XXs)
- derive(X,XXs),
sendMsg(XID,Protocol,From,reply,XXs),
sendMsg(XID,Protocol,From,tell,ready()).
23
Router
location(database,scop,"jdbcmysql//lh",u",p").
start_router() - dbopen(scop,DB),
Psimappsimap.Psimap(DB), assert(processor(Psi
map)), Workersjava.util.LinkedList(),
assert(workers(Workers)). rcvMsg(XID,Protocol,Sup
er,tell,attach()) - assert(super(Super)),
workers(Workers), element(Worker,Workers),
sendMsg(XID,Protocol,Super,tell,ready()). rcvMs
g(XID,Protocol,Super,tell,attach()) -
sendMsg(XID,Protocol,Super,tell,ready()). rcvMsg(
XID,Protocol,From,tell,ready()) -
println(From," is ready."),
workers(Workers), Workers.addLast(From),
super(Super), sendMsg(XID,Protocol,Super,tell,re
ady()).
rcvMsg(XID,Protocol,From,submit,XXs) -
workers(Workers), WorkerWorkers.removeFirst()
, !, sendMsg(XID,Protocol,Worker,forward
,submit(From,XXs)). rcvMsg(XID,Protocol,From,s
ubmit,XXs) - derive(X,XXs),
sendMsg(XID,Protocol,From,reply,XXs),
sendMsg(XID,Protocol,From,tell,ready()). rcvMsg(X
ID,Protocol,Router,forward,submit(Super,XXs))
- derive(X,XXs), sendMsg(XID,Protocol
,Super,reply,XXs), sendMsg(XID,Protocol,Ro
uter,tell,ready()). worker(Result,Task) -
println("About to execute ",Task),
iam(Me), processor(Psimap),
spawn(Psimap,executeTask,Task),
rcvMsg(XID1,self,Me,return,complete(Psimap,execute
Task,Task)), ResultPsimap.getTaskResult().
24
Evaluation methods
  • Find the processing power of each node
  • Expressed in proteins/minute
  • Caution proteins are of varying size
  • Method System.currentTimeMillis() output all
    results in a file
  • Find the processing power of a manager/worker
    setup locally
  • Processing power of manager/workers over two
    sites
  • Processing power of a single manager accessing a
    remote database
  • Available processing power
  • RUG 18 nodes (linux)
  • UCY 7 nodes (5 linux, 2 pcs)

25
1st series of experiments
  • Evaluate speed up locally on UCY
  • 6120 proteins
  • 1500 tasks
  • Prova 1,3

26
2nd series of experiments I
  • Evaluate processing power of each node operating
    alone
  • Expressed in proteins/second

27
2nd series of experiments II
  • Different processing power of nodes should be
    taken into account
  • Processing initial proteins is fast, slowing down
    next.

28
3rd series of experiments
  • SET-UP
  • One node _at_ UCY (cs1005)
  • Worker plus
  • Local to UCY database
  • 4,37 prots/min
  • One node _at_ RUG (lilith)
  • Manager
  • Local to RUG database
  • RESULTS
  • node _at_ UCY
  • 4,37 prots/min (1,18 slowdown)
  • One lilith _at_ RUG
  • 6,64 prots/min (1,05 times slowdown)
  • Overall speed up
  • Just 9 slower than adding the processing power
    of both machines

29
Further Steps
  • Preset 1500 tasks, is it optimal?
  • Dependent on avail. Nodes.
  • Few sites/many nodes per site
  • Try to integrate more nodes, in the processing
  • Few nodes/many sites
  • SETI like, (not currently not feasible)
  • Expected speed up, close to optimal?
  • Evaluate that.
  • We collect the results in a huge file (gt100MB)
    not all results are necessary

30
Thanks Questions?
Write a Comment
User Comments (0)
About PowerShow.com