Provenance in a Collaborative Bio-database RAASWiki - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Provenance in a Collaborative Bio-database RAASWiki

Description:

Queen's Medical Research Institute. University of Edinburgh. Use Cases for Provenance ... Wiki functionality allows versioning and roll back ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 21
Provided by: donald167
Category:

less

Transcript and Presenter's Notes

Title: Provenance in a Collaborative Bio-database RAASWiki


1
Provenance in a Collaborative Bio-databaseRAASWi
ki
  • Donald Dunbar Jon Manning
  • Queens Medical Research Institute
  • University of Edinburgh
  • Use Cases for Provenance
  • April 20th 2009

2
Provenance in Bio-databasesincluding RAASWiki
  • Donald Dunbar Jon Manning
  • Queens Medical Research Institute
  • University of Edinburgh
  • Use Cases for Provenance
  • April 20th 2009

3
Plan
bio-databases
provenance
RAASWiki
collaborative knowledgebases
4
Biological databases
  • Sequences
  • Ensembl, Entrez
  • Structure
  • PDB
  • Expression
  • GEO, ArrayExpress
  • Function
  • Gene Ontology
  • Interaction
  • MINT, BIND, KEGG
  • Warehouses
  • GeneCards, IUPHAR
  • Literature
  • Pubmed

5
How do they handle provenance?
Ensembl produces genome databases for vertebrates
and other eukaryotic species, and makes this
information freely available online.
Gene ID histories (with stable ID)
Evidence for gene predictions
Links to other databases (eg Uniprot)
6
How do they handle provenance?
The PDB archive contains information about
experimentally-determined structures of proteins,
nucleic acids, and complex assemblies.
Primary citation
History deposition and last update
Raw data and protocols
7
How do they handle provenance?
Gene Expression Omnibus a gene
expression/molecular abundance repository
supporting MIAME compliant data submissions, and
a curated, online resource for gene expression
data browsing, query and retrieval.
Standards compliance (protocols, data)
Links within database (microarrays, protocols)
Raw data and protocols
8
How do they handle provenance?
The Gene Ontology project provides a controlled
vocabulary to describe gene and gene product
attributes in any organism.
Evidence for gene annotation (experimental,
computational)
Links to original publications
No versioning, just updates
9
How do they handle provenance?
PubMed is a free search engine for accessing the
MEDLINE database of citations, abstracts and some
full text articles on life sciences and
biomedical topics.
Original source material, authors, abstracts
Unique Pubmed ID (used by other databases)
Continual updates (new papers), occasional
retractions
10
How do they handle provenance?
GeneCards is a searchable, integrated database
of human genes that provides concise genomic,
proteomic, transcriptomic, genetic and functional
information on all known and predicted human
genes.
Lots of data from other databases
IDs/keys from sources
Lots of data integration based on IDs
11
How do they handle provenance?
The IUPHAR database (IUPHAR-DB) integrates
peer-reviewed pharmacological, chemical, genetic,
functional and anatomical information on GPCRs,
ligand-gated ion channels and voltage-gated-like
ion channel subunits encoded by the human, rat
and mouse genomes.
Curated by experts
Original sources plus curation provenance
Suggested citations
12
Newer developments
WikiGenes is the first wiki system to combine the
collaborative and largely altruistic
possibilities of wikis with explicit authorship.
In view of the extraordinary success of Wikipedia
there remains no doubt about the potential of
collaborative publishing, yet its adoption in
science has been limited. Here I discuss a
dynamic collaborative knowledge base for the life
sciences that provides authors with due credit
and that can evolve via continual revision and
traditional peer review into a rigorous
scientific tool.
but.
13
RAASWiki
RAASWiki is a knowledgebase of information on the
renin-angiotensin-aldosterone system. While much
of the seed data were derived from pre-existing
databases such as KEGG and OMIM, supplementary
data are included not easily available through
such resources. This includes short textual
reports on the genes involved, and more
experimentally-oriented information such as
animal models.
Important biology - hypertension
Automatic seeding of database (BioKB)
Collaborative editing (Wiki based, useful
functionality)
Genes, publications, animal models, datasets
14
RAASWiki provenance
Seeded data tagged with source database and date
Edits are tagged with editor and date
Comments are tagged name and date
Wiki functionality allows versioning and roll back
Identifiers for source databases preserves
provenance
Crowd wisdom will hopefully unsure good quality
15
RAASWiki provenance issues
How much detail (each edit, granularity,
versions)?
Who will use provenance data?
Different focus depending on data (who, when,
confidence)
How much should we rely on sources for provenance?
Annotation comments v changing data
Public v private data
Likely to become a big issue
16
What provenance to we need?
Example Gene expression in a transgenic animal
gene expression measurements
gene annotation
where, when
when, what, how
public databases
output from machine
which identifiers
how
processing
integration
what and how did we select genes
data mining

17
What provenance to we need?
Example Curated gene database
database links
curation
contributor, date
source, identifiers, dates
curator input
development
verify, add, delete, modify
schema interface changes
archive
versions, dates
Curated database
18
Collaborative knowledgebases
databases
knowledge
knowledgebase
experiments
papers
19
Collaborative knowledgebaseprovenance issues
Confidence in data
Tracking data to its (real) source
Published papers do not contain all information
When is something (knowledge) finished
Citing of knowlegebase records
Linking between knowledgebase records
Some sort of dynamic publication
20
Conclusions
  • In biology provenance is a mixed bag
  • We use mainly static databases
  • Usually source is clear but not much else
  • RAASWiki contains static and curated data
  • We have implemented a very rudimentary provenance
    scheme
  • Collaborative knowledgebases will need to address
    provenance in new ways
Write a Comment
User Comments (0)
About PowerShow.com