Title: KDE Advantages
1 Workflow in DiscoveryNet Prof. Yike Guo Dept.
of Computing Imperial College London
InforSense Limited
2What is Discovery Net
- Funding One of the six UK national e-science
Projects (2.2 million pounds) -
- Goal Towards Knowledge Unification by
- Construct a Global Wide Knowledge Discovery
Services - Focus on Life Science
- Key Technologies
- Open Service Computing
- High Throughput Devices and Real Time Data Mining
- Real Time Data Integration Information
Structuring - Cross Domain Knowledge Discovery and Management
- Discovery Workflow and Discovery Planning
3Discovery Net Snapshot
4Discovery Net Architecture
DPML Web/Grid Services OGSA
D-Net Clients End-user applications and user
interface allowing scientists to construct and
drive knowledge discovery activities
D-Net Middleware Managing , executing and
deploying discovery plans (workflows) for
distributed knowledge discovery and access to
distributed resources and services
High Performance Communication
Protocol (GridFTP, DSTP..) Grid
Infrastructure (GSI)
Computation Data Resources Distributed
databases, compute servers and scientific
devices.
5Discovery Net Implementation
- A Grid-enabled knowledge discovery platform on
open services standards to make use of Grid
Technology for high performance and distributed
computing - Service-based implementation allows easy
construction and deployment of new composed
knowledge discovery services
6Achievements
- Commercialized by InforSense Ltd (Kensington
Discovery Edition) with gt20 large installations
in life science sector in one year - Most Innovative Award of HPC Challenge SC2002
7Workflow in DiscoveryNet
- Workflow not only provides
- provenance information (A macro program of
recorded user action) - operation logic (A recipe on how to do things)
- But also provides
- means for information/application/service
composition - strategy for distributed computation (in a grid
environment) - model for defining and deployment of new service
- Mechanism for collaborative intelligence (A
method to manage cohabitation of scientists) - In one word, workflow represents the Knowledge
of Action. - That is why we call the workflows in discovery
informatics as Discovery Plans. - DiscoveryNet contributes the technology of
Enterprise Discovery Planning !
8Discovery Process Markup Language Workflow
Representation
- Workflow Discovery Planning by Service
Composition - Towards a Standard Workflow Representation for
Discovery Informatics Discovery Process Markup
Language (DPML)Sorry for another standard, but
it may be useful for - Discovery Planning Recording and managing a
collaboratively-built discovery Process. - Distributed Service Composition Components
organised by the workflow can be executing
anywhere - Discovery Plans as Collaborative Intellectual
Property Discovery Plans can be stored, reused,
audited, refined and deployed in various forms
D-Net Workflow for Genome Annotation 16
services executing across Internet
9Action Abstraction Making Nodes
Applications/Services
Functional Abstraction (parametersmeta data, API)
Provenance Abstraction (historycontrol protocol)
Data Abstraction (object model data type mapping)
10Workflow Construction Making Plans from Nodes
Data Transformation Model
Information Protocol
Presentation Method
11Workflow Deployment-- Making Nodes from Plans
Workflow
Functional Abstraction (parametersmeta data)
Provenance Abstraction (historycontrol protocol)
Data Abstraction (data type mapping)
12Workflow Deployment
- Workflow Deployment On-demand rapid
Solution/Service construction - Towards a Dynamic Deployment of Knowledge
Discovery Procedures - Deployment Engine allows users to build and
publish solution based on DPML code coordinating
remotely execute components - My Discovery New and personalised discovery
solution described in DPML, can be rapidly build
as the tools and key internal IP within an
organisation - Storage Reporting Servers allow users to share
DPML procedures and to generate workflow audit
reports
Discovery Component
Report
Discovery Workflow in DPML
Batch processing
Discovery Service
13Workflow Integration ---Knowledge Unification
- Knowledge Unification Dynamically construction
of schemas to organise related cross-domain
analysis results and background knowledge - Towards a Knowledge Schema Framework for
integrative discovery - A Mechanism of Indexing, Annotating Discovery
Results An querying and browsing system for
discovered knowledge - Discovery Plan based Knowledge Management An
abstraction structure of discovery plans to
organise enterprise level discovery
activitiessubjects, projects, experiments .
14Efficient access is provided to different types
of data (numeric, text, image, etc.) drawn from
multiple storage sources (files, databases,
Internet).
In this example, Kensington Discovery Edition is
being used for real time genomic data analysis in
a distributed environment.
15(No Transcript)
16Real-time sequencing
Distributed annotation resources
This complex nucleotide annotation workflow takes
high- throughput sequence data in London and
annotates using information and computational
resources distributed worldwide.
17Workflow created from reusable components and
accesses more than 20 computational services, 15
databases and 25 different computing systems.
18(No Transcript)
19Visual verification of predictions performed
using Artemis2 editor integrated as a component
into the KDE workflow.
20(No Transcript)
21Interactive analysis to support a wider research
community in visualisation and verification of
annotation results.
22(No Transcript)
23Workflow and distributed computation grid created
with Kensington Discovery Edition from
InforSense.
24In this example we will use the capabilities of
TextSense to search the annotations for one gene
and retrieve the diseases it might be related to.
We first start with a table containing our gene.
25(No Transcript)
26Now we proceed with retrieving similar genes
based on BLAST searches and annotating them with
information from various online sources.
27(No Transcript)
28After that, the table holds the Medline abstracts
relating to the genes.
29So we want to search these abstracts for
occurrences of known diseases. We do this using a
TextSense routine.
30(No Transcript)
31Disease occurrences in each annotation can now be
further explored.
32(No Transcript)
33In order to gain a better understanding of the
relationships between different diseases, we will
view them as association rules.
34(No Transcript)
35Having performed this analysis, we want to
further research how the frequency of these terms
changes over years.
36(No Transcript)
37Our new table contains the counts of appearances
of different words in Medline articles over a
four year period.
38Now we use Kensingtons core analysis techniques
to transform the counts into frequencies and
explore two different normalisation options.
39(No Transcript)
40And now we use clustering to find the words with
a similar profile over the years.
41(No Transcript)
42The initial results are based on the relative
changes in frequency each year.
43The other branch will explore the absolute
frequencies approach.
44(No Transcript)
45In this demo we have used TextSenses
capabilities to extract and analyse relevant
information from a multitude of documents
available in public and private data sources.
46Workflow in Discovery Informatics
- Supporting Compositional Service Workflow as
Service Integrator - Supporting Collaborative Research-Workflow as
Discovery Plan - Support Provenance Workflow as the Record of
Research History - From Information Integration to Knowledge
UnificationWorkflow as Knowledge Schema
47Workflow in DiscoveryNet