BioMart - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

BioMart

Description:

BioMart – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 75
Provided by: ebi3
Category:
Tags: biomart | wtf

less

Transcript and Presenter's Notes

Title: BioMart


1
BioMart
  • Data management for biology and beyond

Arek Kasprzyk European Bioinformatics
Institute Lilly Singapore, September 2007
2
BioMart
  • A collaboration
  • European Bioinformatics Institute (EBI)
  • Cold Spring Harbor Laboratory (CSHL)
  • Aim
  • A generic data management system with a
    particular focus on supporting biological research

3
Biological databases
4
BioMart - synopsis
  • Out of the box relational database backed
    website
  • A set of tools for organizing data mining
    environment to suit a particular user community

5
Data Flow
Source data
JAVA
Mart
PERL
6
Data warehouse
JAVA
Mart
PERL
7
Federated
Mart
Mart
JAVA
Mart
PERL
Mart
Mart
Mart
Mart
Mart
8
User interfaces
9
Admin Tools
10
Basic concepts part 1
  • Dataset
  • Filter
  • Attribute

11
Examples
  • Upstream sequences
  • for all rat kinases
  • up-regulated in brain and associated with a
    QTL for a neurological disorder
  • Name, chromosome position, description
  • of all rat genes
  • located on chromosome 1, expressed in lung,
    associated with human homologues and
    non-synonymous snp changes
  • Rent prices
  • for all the properties
  • available in the bay area from jan the 1st

12
Dataset, Filter, Attribute
13
Basic concepts - part 2
  • Dataset
  • Filter
  • Attribute
  • Exportable
  • Importable

14
Federating datasets
15
Federating datasets
16
Web user interface
17
API
my initializer BioMartInitializer-new('regi
stryFile'confFile) my registry
initializer-getRegistry my query
BioMartQuery-new('registry'registry,'virtual
SchemaName'central_server_1')
query-setDataset("hsapiens_gene_ensembl")
query-addFilter("chromosome_name",
1") query-addAttribute("ensembl_gene_id")
query-addAttribute("ensembl_transcript_id")
query-addAttribute(ensembl_peptide_id")
query-setDataset(msd) query-addFilter(ex
periment_type", NMR")
query-addAttribute("pdb_id")
query-addAttribute(resolution") query-addA
ttribute(release_date") query-addAttribute(
header") my query_runner BioMartQueryRunne
r-new() query_runner-execute(query)
query_runner-printResults()
18
Web service

name"chromosome_name" value"1"/ name"ensembl_gene_id"/ name"ensembl_transcript_id"/ name"ensembl_peptide_id"/
name"experiment_type" valueNMR"/ name"pdb_id"/ Attribute namerelease_date"/ nameheader"/
19
Deploying BioMart
  • Transformation
  • Configuration

20
Transformation
Mart
21
Mart Schema reversed star
22
Transformation MartBuilder
Source db
Mart db
23
MartBuilder
24
MartBuilder
25
MartBuilder
26
MartBuilder
27
Configuration MartEditor
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Dataset Configuration
32
Examples
33
(No Transcript)
34
Use Example 1
All genes in the human genome up-regulated in
Pancreatic Adenocarcinomas (PDACs) vs Normal
Pancreas (ND))
2. Attributes
1. Filter
3. Results
35
Use Example 2
all upstream sequences for all genes on
chromosome 1 up-regulated in Pancreatic
Adenocarcinomas (PDACs) vs Normal Pancreas (ND))
1. Filter
2. Attributes
3. Results
36
Use Example 3
Just Finished my experiment and would like to get
the overlaps between my results and those
reported in previous studies !
1. Filter
2. Attributes
3. Results
37
Web service
38
Perl
39
DAS
40
Bioconductor package biomaRt
41
Galaxy
42
Taverna
43
Central Server (www.biomart.org)
44
www.biomart.org/biomart/martservice
45
Genomic data
46
Uniprot, MSD, ArrayExpress
47
Model organism databases
48
Developmental models
49
Proteomics
50
Target SNP selection for the study of type 1
diabetes (T1D), malaria and dengue
Name Fragment Position Alleles strand SNP1 AL139
258 1659852 T/A 1 SNP2 NT_25698 2569873 C/T
-1 SNP3 chr13 1125698 C/G 1
Genetics of Infectious and Autoimmune Diseases,
Pasteur Institute, INSERM U730, Paris, France.
51
CAPRISA understanding HIV pathogenesis and
epidemiology as well as HIV/AIDS treatment and
prevention
Clinical Data
52
Unilever
  • Human study to evaluate Omics in assessing safety
    indicators
  • Study of skin inflammation in response to
    detergent
  • Skin samples taken and analyzed with multiple
    Omics techniques.
  • Blood
  • Skin biopsy
  • Microdialysis

53
Future plans
54
New configuration system
  • Scalability
  • Updates and maintenance of large configurations
  • Run time server scalability (cache and memory)
  • Scalable for multiple mart users (single instance
    - security)
  • Scalable for alternative configurations (new
    MartGUI framework)

55
New MartGUI framework
  • No more one size fits all approach
  • Alternative GUIs and configurations
  • No more tabulated data only
  • Visualization and analysis tools

56
New GUI framework
SITE HEADER

Home
Gene Id conversion
Functional annotation
Welcome to my data mining website
Compare two gene lists
Analyze gene list
Draw distribution
Draw bla bla chart
Advanced search
57
New GUI framework
SITE HEADER

Home
Gene Id conversion
paste your ids here
Functional annotation
Compare two gene lists
Analyze gene list
Hugo
Genbank
Draw distribution
Trembl
Uniprot
Draw bla bla chart
Advanced search
Submit
58
New GUI framework
Fu
Home
Advanced search
Gene Id converter

Welcome to my data mining website
59
New GUI framework
Fu
Home
Advanced search
Gene Id conversion

paste your ids here
Genebank
Hugo
Uniprot
Swissprot
Submit
60
Cytogenetic distribution of pancreatic cancer
genes satisfying my query (histogram)
61
Cytogenetic distribution of pancreatic cancer
genes satisfying my query (ideogram)
62
Cytogenetic distribution of chromosomal
aberrations in pancreatic cancer
63
(No Transcript)
64
New GUI framework
65
New GUI framework
66
New configuration tool
  • MartConfigurator
  • Handles a complete registry object
  • Defines GUI units
  • Automated service discovery
  • Manual link override
  • Automated updates for large configurations
  • Improved user interaction

67
Summary
  • A set of tools for creating your own tailor-made
    data management environment complete with the
    out of the box website
  • Open source
  • Misleading name

68
Credits
  • Martians
  • Syed Haider
  • Richard Holland
  • Damian Smedley
  • Contributors
  • Steffen Durinck (NCI, NIH)
  • Eric Just (Northwestern University)
  • Don Gilbert (Indiana University)
  • Darin London (Duke University)
  • Will Spooner (CSHL)
  • Gudmundur Thorisson (CSHL)
  • Benoit Ballester (Universite de la Mediterranee)
  • James Smith (Ensembl)
  • Arne Stabenau (Ensembl)
  • Andreas Kahari (Ensembl)
  • Craig Melsopp (Ensembl)
  • Katerina Tzouvara (EBI)
  • Paul Donlon (Unilever)

69
(No Transcript)
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com