Capture, integration, and sharing of functional genomic data - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Capture, integration, and sharing of functional genomic data

Description:

... Fernandes, Justin Ferris, Rob Gaizauskas, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Claire Jennings, Ananth Krishna, Xiaojian Liu, ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 37
Provided by: helenro7
Category:

less

Transcript and Presenter's Notes

Title: Capture, integration, and sharing of functional genomic data


1
Capture, integration, and sharing offunctional
genomic data
  • Steve Oliver
  • Professor of Genomics
  • School of Biological Sciences
  • University of Manchester
  • http//www.cogeme.man.ac.uk
  • http//www.bioinf.man.ac.uk

2
What are biologists interested in?
3
GENOME
TRANSCRIPTOME
PROTEOME
METABOLOME
4
The nature of proteomics experiment data
  • Sample generation
  • Origin of sample
  • hypothesis, organism, environment, preparation,
    paper citations
  • Sample processing
  • Gels (1D/ 2D) and columns
  • images, gel type and ranges, band/spot
    coordinates
  • stationary and mobile phases, flow rate,
    temperature, fraction details
  • Mass Spectrometry
  • machine type, ion source, voltages
  • In Silico analysis
  • peak lists, database name version, partial
    sequence, search parameters, search hits,
    accession numbers

5
A Systematic Approach to Modelling, Capturing
and Disseminating Proteomics Experimental Data
  • http//pedro.man.ac.uk/

6
The PEDRo UML schema in reduced form
7
The Framework Around PEDRo
  • Lab generated data is encoded using the PEDRo
    data entry tool, producing an XML (PEML) file for
    local storage, or submission
  • Locally stored PEML files may be viewed in a web
    browser (with XSLT), allowing web pages to be
    quickly generated from datasets
  • Upon receipt of a PEML file at the repository
    site, a validation tool checks the file before
    entering it into the database
  • The repository (a relational database) holds
    submitted data, allowing various analyses to be
    performed, or data to be extracted as a PEML file
    or another format

8
INTEGRATION
9
Why integrate data?
  • These 200 genes are up-regulated in my
    experiment. Are any of their protein products
    known to interact?
  • Data is stored at a variety of sites and formats.
  • Databases designed mainly for browsing
  • (MIPS, SGD, BIND, SCPD, KEGG).
  • Need databases that allow complex queries.
  • Need to be easily usable by biologists.

10
Genome Information Management System (GIMS)
Paton NW, Khan SA, Hayes A, Moussouni F, Brass A,
Eilbeck K, Goble GA, Hubbard SJ, Oliver SG
(2000) Conceptual modelling of genomic
information. Bioinformatics 16, 548-557.
11
GIMS
  • Integrates genomic and functional data.
  • Consists of two parts
  • GIMS Database
  • GIMS User Interface

12
GIMS data warehouse
Canned Queries
Browser
Analysis Library
SGD
MIPS
maxD
13
Database implementation
  • Uses the object database FastObjects.
  • All database classes and analysis programs are
    written in Java.
  • Allows close integration of the programming
    language with the database.
  • Allows fast access to database data from
    application programs.
  • Allows data to be stored in a way that reflects
    the underlying mechanisms in the organism.
  • Very flexible and extensible.

14
(No Transcript)
15
GIMS Contents
16
GIMS Contents
17
GIMS User Interface
  • Java application.
  • Can download from http//img.cs.man.ac.uk/gims
  • Communicates with database via RMI.
  • On start-up, application is sent information
    about database classes and canned queries.
  • Very flexible.
  • Allows user to browse database, ask canned
    queries, and store and combine data sets.
  • Can save results as txt, html or xml.

18
(No Transcript)
19
Selecting Canned Queries
Query categories.
Queries in selected category
Initially empty store.
20
Parameterising a Query
Previously selected query
Parameters for specific run selects
down-regulated genes in the nucleus
21
Viewing the Results
Result collection
Operations on collections
22
Selecting a Second Query
23
Setting Its Parameters
Parameters for specific run selects
down-regulated genes in the same experiment that
are transcription factors
24
Obtaining Its Results
25
Inter-relating Results
Collections selected for operating on
Remove one result from the other
26
Result of Difference
27
GIMS empowers the biologist
28
Resources at the centre
Workflows that could be used to generate this data
People who have registered an interest in this
data
Related Data
Provenance record on how the data was produced
Ontologies describing data
29
Biologists at the centre
Workflows they wrote or used
People they collaborate with
30
myGrid
  • EPSRC UK e-Science pilot project.
  • Open Source Upper Middleware for Bioinformatics.
  • (Web) Service-based architecture -gt Grid
    services.
  • 42 months, 24 months in.
  • Prototype v1 Release Sept 2004 some services
    available now.

www.mygrid.org.uk
31
Workflows are in silico experiments
32
Application Work bench demonstrator
  • The myGrid service components are used in a
    demonstration application called the myGrid
    WorkBench, which provides a common point of use
    for the services.
  • We can select data from the myGrid Information
    repository (mIR), select a workflow based on its
    semantic description, and examine the results.

33
e-Science Provenance
  • Like a bench experiment, myGrid records the
    materials and methods it has used for an in
    silico experiment in a provenance log.
  • This is the where, what, when and how the
    experiment was run.
  • Derivation paths workflows, queries
  • Annotations notes
  • Evolution paths workflow ?
    workflow

34
e-Science Notification
  • A notification service can inform the mIR and
    the user (proxy) that data, workflows, services,
    etc. have changed and thus prompt actions over
    data in the mIR.
  • Notifications are presented to the user with a
    client in the workbench environment.
  • User registers interest in notification
    topics

35
The myGrid Team
  • Matthew Addis, Nedim Alpdemir, Rich Cawley,
    Vijay Dialani, Alvaro Fernandes, Justin Ferris,
    Rob Gaizauskas, Kevin Glover, Carole Goble, Chris
    Greenhalgh, Mark Greenwood, Claire Jennings,
    Ananth Krishna, Xiaojian Liu, Darren Marvin,
    Karon Mee, Simon Miles, Luc Moreau, Juri Papay,
  • Norman Paton, Simon Pearce, Steve Pettifer,
  • Milena Radenkovic, Peter Rice, Angus Roberts,
    Alan Robinson, Martin Senger, Nick Sharman, Paul
    Watson, Anil Wipat and Chris Wroe.

36
Need GRID to empower the biologist
Write a Comment
User Comments (0)
About PowerShow.com