Knowledge and Provenance: A knowledge model perspective - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Knowledge and Provenance: A knowledge model perspective

Description:

Knowledge and Provenance: A knowledge model perspective – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 41
Provided by: Caro110
Category:

less

Transcript and Presenter's Notes

Title: Knowledge and Provenance: A knowledge model perspective


1
Knowledge and ProvenanceA knowledge model
perspective
  • Carole Goble,
  • University of Manchester, UK

2
Talk roadmap
What is this provenance about and for?
Knowledge for Provenance
The Provenance of Knowledge
Knowledge technologies
How do we represent knowledge for and about
provenance?
Where do knowledge assertions come from?
3
my Context
  • Knowledge-driven Middleware for data
    intensive in silico experiments in biology
  • http//www.mygrid.org.uk

4
A real bio provenance log
5
Any and every experimental item attracts
provenance (so long as you can ID it).
  • Experimental design components
  • workflow specifications query specifications
    notes describing objectives applications
    databases relevant papers the web pages of
    important workers, services
  • Experimental instances that are records of
    enacted experiments
  • data results a history of services invoked by a
    workflow engine instances of services invoked
    parameters set for an application notes
    commenting on the results
  • Experimental glue that groups and links design
    and instance components
  • a query and its results a workflow linked with
    its outcome links between a workflow and its
    previous and subsequent versions a group of all
    these things linked to a document discussing the
    conclusions of the biologist

6
Provenance is metadata
  • intended for sharing, retrieving, integrating,
    aggregating and processing.
  • generated with the hope that it is comprehensive
    enough to be future-proofed.
  • recorded for those who we do not yet know will
    use the object and who will likely use it in a
    different way.
  • machine computational free text of limited help.
  • Provenance is the knowledge that makes
  • An item interpretable and reusable within a
    context
  • An item reproducible or at least repeatable.
  • Its part of the information model of any system

7
Question What ATPase superfamily proteins are
found in mouse?
1. Q9CQV8 O70468 143B_MOUSE from Swiss-Prot
version 30, 05/11/02, 1645 GMT, EBI server. 2.
O70455, P54775 143B_MOUSE from Swiss-Prot version
29, 05/11/02 1645 GMT, local copy. 3. P43686 and
P54775 derived by a distributed query over DB1
and DB2. 4. InterPro (no particular version) is
a pattern database for protein superfamilies and
domains for GPCRs but you need an account. 5.
The publicly available workflow mouse ATPase
(http//www.somelab.edu/bio/carole/wf/3345.wsfl)
will generate the result from data in your
personal repository and you have permission to
run the services it needs. Click to run it. 6.
The Attwood lab expertise is in nucleotide
binding proteins (ATPase superfamily proteins are
nucleotide binding proteins). 7. Jones published
a new paper on this in Nature Genetics two weeks
ago, and you have an account to access it
on-line. 8. Smith in your lab asked this question
yesterday and the answer he got is annotated by a
commentary in his e-Log Book. 9. P43686 (human)
calculated by applying the algorithm ABC located
at NCBI using data in database AAA
Provenance (know-wherefrom)
Database query (know-what)
Replicas (know-which)
Virtual data products (know-how)
Ontology and Inference (know-whether)
Workflow (know-how)
Authorisation, Authentication and
Accounting (know-who)
Personalised profile (know-whom-to)
Collaboration community (know-where, know-when)
Explanation (know-why)
Digital archive (know-which)
Annotation notes (know-that)
8
Provenance is contextual metadata
  • We look at the same things in different ways and
    different things in the same way
  • Our data alone does not describe our work
  • We have to capture this context.

Hero http//hero.geog.psu.edu/ Hero_knowledge_mana
gement.pdf Downloaded 301103
9
Provenance forms
  • Derivations
  • A path like a workflow, script or query.
  • Linking items, usually in a directed graph.
  • An explanation of when, who, how something
    produced.
  • Execution Process-centric
  • Annotations
  • Attached to items or collections of items, in a
    structured, semi-structured or free text form.
  • Annotations on one item or linking items.
  • An explanation of why, when, where, who, what,
    how.
  • Data-centric

10
Workflows as in silico experiments
  • Freefluo workflow enactment engine
  • WSFL
  • Scufl
  • Semantic Workflow discovery
  • Finding workflows that others have done, and that
    I have done myself
  • Semantic service discovery
  • Finding classes of services
  • Guiding service composition
  • (We dont do automated composition)
  • Dynamic workflow enactment service discovery and
    invocation
  • Choose services instances when running workflow
  • User involvement

11
Semantic discovery services workflows
  • Services and workflows in registry have RDF and
    OWL descriptions
  • Selection by the types of inputs they use,
    outputs they produce, the bioinformatics tasks
    they perform
  • Querying using RDQL over RDF UDDI registry for
    operational metadata
  • Matching using FaCT OWL classification for
    concept-based metadata

A registry browser
A workflow wizard
12
Provenance forms in myGrid
  • Derivations
  • FreeFluo Workflow Enactment Engine provides a
    detailed provenance record stored in the myGrid
    Information Repository (mIR) describing what was
    done, with what services and when
  • XML document, soon to be an RDF model
  • Annotations
  • Every mIR object has Dublin Core provenance
    properties described in an attribute value model

13
Provenance of data
  • Operational execution trail

GeneAC005412.6
SNP000010197
input
output
processstart timeend time
run_for
by_service
urn Clare Jennings
lsidHGVBase_retrieve
14
Provenance of knowledge
  • Declarative semantic execution trail

contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
input
output
as stated by
processstart timeend time
run_for
by_service
urn Claire Jennings
lsidHGVBase_retrieve
15
Provenance of knowledge
urn Carole Goble
  • Trust and attribution

disputed by
contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
input
output
as stated by
processstart timeend time
run_for
by_service
urn Claire Jennings
lsidHGVBase_retrieve
16
Provenance of knowledge
  • Aggregation and integration

processstart timeend time
run_for
by_service
urn Bill Jones
lsidBIGDbretrieve
as stated by
contains_single_nucleotide_polymorphism
GeneAC005412.6
SNP000010197
17
20,000 feet and ground level
  • Top Down provenance
  • What is going on?
  • Unification and summaries of collective
    provenance knowledge.
  • Collaborative, Awareness, Experience base,
    Scientific Corporate memory.
  • What projects have something to do with human
    SNPs?
  • What experiments use the PSI-BLAST service
    regardless of version?
  • Bottom Up provenance
  • Where did this data object http//doh.dah.ac.uk/
    come from?
  • Which version of Swiss-Prot was run in workflow
    http/blah.ac.uk/?

Build up layers of provenance knowledge
18
Provenance for People and Machines
Subjective
People
Experiment
User
Manual/ semi-automated
Trust
Services
Domain
Objective
Data
Contextual
Execution
Workflow
Machines
Context-free
Automated
19
1. Explicitly capture Context
  • Reuse methods and strategies (e.g., protocols)
  • Make explicit the situational bias that is
    normally implicit
  • Enable future generations of scientists to
    follow our work
  • To capture meaning, we must devise a way of
    representing concepts and their relationships

Hero http//hero.geog.psu.edu/ Hero_knowledge_mana
gement.pdf Downloaded 301103
20
1. Explicitly capture Context
  • Using models and terms
  • that can be shared and interpreted
  • that are extensible and preclude premature
    restrictions
  • that are navigable and computationally processable

Hero http//hero.geog.psu.edu/ Hero_knowledge_mana
gement.pdf Downloaded 301103
21
2. Bridge islands of exported provenance
Service 1
Service 2
Workflow 1
Experimental Investigation 1
Data 1
22
Not all exports are the same
Service 1
Service 2
Workflow 1
Experimental Investigation 1
Data 1
23
So we need to
  • Uniquely identify items through URIs and Life
    Science Identifiers (GSH/GSR/Handle.net)
  • Explicitly expose provenance by assertions in a
    common data model
  • Publish and share consensually agreed ontologies
    so we can share the provenance metadata and add
    in background knowledge
  • Then we can query, filter, integrate and
    aggregate the provenance metadata
  • and reason over it to infer more provenance
    metadata using rules
  • and attribute trust to the provenance
  • Flexibly so that do not cast in stone models and
    terms, and so can cope with different degrees of
    description.

Whats an Ontology? A common vocabulary of
terms Some specification of the meaning of the
terms Concepts, relationships, axioms A shared
consensual understanding for people and machines
24
W3C Metadata language/model Resource Description
Framework
  • Common model for metadata
  • Assertions as triples (subject, predicate,
    object) forming graphs.
  • Associate URIs (LSIDs) with other URIs (LSIDs).
  • Associate URIs with OWL concepts (which are
    URIs).
  • RDQL, repositories, integration tools,
    presentation tools
  • Query over, Link together, Aggregate, Integrate
    assertions.
  • Avoids pre-commitment
  • Self-describing
  • Incremental
  • Extensible
  • Advantage and drawback.

Graphic based on Tim Berners-Lee
http//www.w3.org/2003/Talks/0521-www-keynote-tbl/
slide22-0.html
25
Bridging islands
Service 1
Service 2
Workflow 1
Experimental Investigation 1
Data 1
26
Bridging islands Concepts and LSID
Service 1
Service 2
Workflow 1
RDF
RDF
RDF
RDF
RDF
RDF
Experimental Investigation 1
Data 1
27
W3C Ontology language/model OWL
  • Continuum of expressivity
  • Concepts, roles, individuals, axioms
  • From simple frames to description logics
  • Sound and complete formal semantics
  • Compositional and property based
  • Reasoning to infer classification
  • Eas(ier) to extend and evolve and merge
    ontologies
  • A web language
  • Tools, tools, tools!

28
Bridging islands Concepts and LSIDs
Service 1
Service 2
Workflow 1
RDF
RDF
RDF
RDF
RDF
RDF
Experimental Investigation 1
Data 1
29
Bridging islands Concepts and LSIDs
Service 1
LSID
LSID
Service 2
LSID
Workflow 1
RDF
LSID
LSID
RDF
RDF
RDF
LSID
LSID
RDF
RDF
LSID
LSID
Experimental Investigation 1
Data 1
LSID
LSID
30
Layers of Knowledge Languages
Attribution
Explanation
Rules Inference
Ontologies
Metadata
Standard Syntax
Identity
Wedding cake courtesy of Tim Berners-Lee
31
myGrid everything has a concept LSID
Workflows
Provenance record of workflow runs
Notes
People
Data holdings
Services
32
Linking objects to objects via URIs and LSIDs
People to notify of the workflow status
Provenance of the workflow template. Related
workflows.
Ontologies describing workflows
33
Lymphocyte and neutrophil are subsumed by the
concept white blood cell
Generated link anchors
34
Annotating a workflow log with concepts
5. Create the annotation
4. Provide a description
3. Select the concept
1. Choose the ontology
2. Select an area to annotate with
35
Generating provenance
Data and metadata from the run
startTime, endTime, service instances invoked
RDFOWL
Workflow execution Template
Scufl
RDFOWL
mIR
Identify workflow
Execution Provenance log
FreeFluo WFEE
Bind services
Input data parameters
Knowledge Provenance log
Workflow knowledge template
RDF registry
OWL descriptions
RDFOWL
Knowledge arising from workflow
36
P Afflard et al The Grid(s)? _at_ Novartis presented
at PRISM PharmaGrid retreat, July 2003
37
William Pike, Ola Ahlqvist, Mark Gahegan, Sachin
Oswal Supporting Collaborative Science through a
Knowledge and Data Management Portal in 1st
Semantic Web Conference (ISWC2003) Workshop on
Retrieval of Scientific Data, Florida, USA,
October 2003
38
Two views of a gravity model conceptfrom
the Hero CODEX web tool
William Pike, Ola Ahlqvist, Mark Gahegan, Sachin
Oswal Supporting Collaborative Science through a
Knowledge and Data Management Portal in 1st
Semantic Web Conference (ISWC2003) Workshop on
Retrieval of Scientific Data, Florida, USA,
October 2003
  • An ontological description shows how one
    geoscientist constructs a model
  • a social network reveals which users favour
    different instances of the model, with edge
    length suggesting the degree of support.

39
Collaboratory for Multi-Scale ChemicalScience
CMCS Pedigree Graph portlet showing provenance
relationships between resources (colour coded by
original relationship type).
CMCS Pedigree Browser showing the metadata and
relationships of the selected data set.
40
Provenance dimensions connected by concepts and
identifiers
project


Services






Workflow instances
Author
project
workflow template
Based on http//www.w3.org/2003/Talks/0521-www-key
note-tbl/slide22-0.html
41
Reflections annotations
  • Annotation metadata model for myGrid holdings are
    a Graph
  • If it waddles like RDF and quacks like RDF, its
    RDF
  • Experiments in RDF scalability
  • Co-existence of RDF and other data models
    (relational)
  • Acquisition of annotations and adverts
  • Automated by mining WSDL docs, mining ws-info
    docs
  • Deep annotation works ok for bioinformatic
    service concepts (its an EMBL record) but
  • Annotating with biologically meaningful concepts
    is harder
  • Data in the mIR (its a lymphocyte)
  • Manual annotation cost is high!
  • Service/workflow publication tools
  • Dealing with change
  • Ontology changes service changes annotations
    change.

42
Random Thoughts
  • Where does the knowledge come from (see Luc)?
  • How do we model trust (see Luc)?
  • Scalability of Semantic Web technologies?
  • Visualisation of knowledge (see monica)?
  • Whats the lifecycle of provenance?
  • Different knowledge models for different
    disciplines?
  • Layers of provenance
  • Provenance that is domain knowledge
  • Provenance for context vs execution
  • People vs machine
  • Different models for different items but still
    needs to be integrated
  • Technologies for sharing and integrating that are
    flexible.

knowledge
workflow
provenance
43
Talk provenance
  • myGrid http//www.mygrid.org.uk
  • Jun Zhao, Mark Greenwood, Chris Wroe, Phil Lord,
    Chris Greenhalgh, Luc Moreau, Robert Stevens
  • Hero http//hero.geog.psu.edu/
  • William Pike, Ola Ahlqvist, Mark Gahegan, Sachin
    Oswal
  • Collaboratory for Multi-Scale ChemicalScience
    CMSC
  • James D. Myers, Carmen Pancerella, Carina
    Lansing, Karen L. Schuchardt, Brett Didier
  • Chimera
  • Michael Wilde, Ian Foster
  • Knowledge Space
  • Novartis
  • And special thanks to Ian Cottam for heroic
    support when my laptop died yesterday. Afternoon.
Write a Comment
User Comments (0)
About PowerShow.com