Title: BioMart
1BioMart
- Data management for biology and beyond
Arek Kasprzyk European Bioinformatics
Institute Lilly Singapore, September 2007
2BioMart
- A collaboration
- European Bioinformatics Institute (EBI)
- Cold Spring Harbor Laboratory (CSHL)
- Aim
- A generic data management system with a
particular focus on supporting biological research
3Biological databases
4BioMart - synopsis
- Out of the box relational database backed
website - A set of tools for organizing data mining
environment to suit a particular user community
5Data Flow
Source data
JAVA
Mart
PERL
6Data warehouse
JAVA
Mart
PERL
7Federated
Mart
Mart
JAVA
Mart
PERL
Mart
Mart
Mart
Mart
Mart
8User interfaces
9Admin Tools
10Basic concepts part 1
11Examples
- Upstream sequences
- for all rat kinases
- up-regulated in brain and associated with a
QTL for a neurological disorder -
- Name, chromosome position, description
- of all rat genes
- located on chromosome 1, expressed in lung,
associated with human homologues and
non-synonymous snp changes -
- Rent prices
- for all the properties
- available in the bay area from jan the 1st
12Dataset, Filter, Attribute
13Basic concepts - part 2
- Dataset
- Filter
- Attribute
- Exportable
- Importable
14Federating datasets
15Federating datasets
16Web user interface
17API
my initializer BioMartInitializer-new('regi
stryFile'confFile) my registry
initializer-getRegistry my query
BioMartQuery-new('registry'registry,'virtual
SchemaName'central_server_1')
query-setDataset("hsapiens_gene_ensembl")
query-addFilter("chromosome_name",
1") query-addAttribute("ensembl_gene_id")
query-addAttribute("ensembl_transcript_id")
query-addAttribute(ensembl_peptide_id")
query-setDataset(msd) query-addFilter(ex
periment_type", NMR")
query-addAttribute("pdb_id")
query-addAttribute(resolution") query-addA
ttribute(release_date") query-addAttribute(
header") my query_runner BioMartQueryRunne
r-new() query_runner-execute(query)
query_runner-printResults()
18Web service
name"chromosome_name" value"1"/ name"ensembl_gene_id"/ name"ensembl_transcript_id"/ name"ensembl_peptide_id"/
name"experiment_type" valueNMR"/ name"pdb_id"/ Attribute namerelease_date"/ nameheader"/
19Deploying BioMart
- Transformation
- Configuration
20Transformation
Mart
21Mart Schema reversed star
22Transformation MartBuilder
Source db
Mart db
23MartBuilder
24MartBuilder
25MartBuilder
26MartBuilder
27Configuration MartEditor
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Dataset Configuration
32Examples
33(No Transcript)
34Use Example 1
All genes in the human genome up-regulated in
Pancreatic Adenocarcinomas (PDACs) vs Normal
Pancreas (ND))
2. Attributes
1. Filter
3. Results
35Use Example 2
all upstream sequences for all genes on
chromosome 1 up-regulated in Pancreatic
Adenocarcinomas (PDACs) vs Normal Pancreas (ND))
1. Filter
2. Attributes
3. Results
36Use Example 3
Just Finished my experiment and would like to get
the overlaps between my results and those
reported in previous studies !
1. Filter
2. Attributes
3. Results
37Web service
38Perl
39DAS
40Bioconductor package biomaRt
41Galaxy
42Taverna
43Central Server (www.biomart.org)
44www.biomart.org/biomart/martservice
45Genomic data
46Uniprot, MSD, ArrayExpress
47Model organism databases
48 Developmental models
49Proteomics
50Target SNP selection for the study of type 1
diabetes (T1D), malaria and dengue
Name Fragment Position Alleles strand SNP1 AL139
258 1659852 T/A 1 SNP2 NT_25698 2569873 C/T
-1 SNP3 chr13 1125698 C/G 1
Genetics of Infectious and Autoimmune Diseases,
Pasteur Institute, INSERM U730, Paris, France.
51CAPRISA understanding HIV pathogenesis and
epidemiology as well as HIV/AIDS treatment and
prevention
Clinical Data
52Unilever
- Human study to evaluate Omics in assessing safety
indicators - Study of skin inflammation in response to
detergent - Skin samples taken and analyzed with multiple
Omics techniques. - Blood
- Skin biopsy
- Microdialysis
53Future plans
54New configuration system
- Scalability
- Updates and maintenance of large configurations
- Run time server scalability (cache and memory)
- Scalable for multiple mart users (single instance
- security) - Scalable for alternative configurations (new
MartGUI framework)
55New MartGUI framework
- No more one size fits all approach
- Alternative GUIs and configurations
- No more tabulated data only
- Visualization and analysis tools
56New GUI framework
SITE HEADER
Home
Gene Id conversion
Functional annotation
Welcome to my data mining website
Compare two gene lists
Analyze gene list
Draw distribution
Draw bla bla chart
Advanced search
57New GUI framework
SITE HEADER
Home
Gene Id conversion
paste your ids here
Functional annotation
Compare two gene lists
Analyze gene list
Hugo
Genbank
Draw distribution
Trembl
Uniprot
Draw bla bla chart
Advanced search
Submit
58New GUI framework
Fu
Home
Advanced search
Gene Id converter
Welcome to my data mining website
59New GUI framework
Fu
Home
Advanced search
Gene Id conversion
paste your ids here
Genebank
Hugo
Uniprot
Swissprot
Submit
60Cytogenetic distribution of pancreatic cancer
genes satisfying my query (histogram)
61Cytogenetic distribution of pancreatic cancer
genes satisfying my query (ideogram)
62Cytogenetic distribution of chromosomal
aberrations in pancreatic cancer
63(No Transcript)
64New GUI framework
65New GUI framework
66New configuration tool
- MartConfigurator
- Handles a complete registry object
- Defines GUI units
- Automated service discovery
- Manual link override
- Automated updates for large configurations
- Improved user interaction
67Summary
- A set of tools for creating your own tailor-made
data management environment complete with the
out of the box website - Open source
- Misleading name
68Credits
- Martians
- Syed Haider
- Richard Holland
- Damian Smedley
- Contributors
- Steffen Durinck (NCI, NIH)
- Eric Just (Northwestern University)
- Don Gilbert (Indiana University)
- Darin London (Duke University)
- Will Spooner (CSHL)
- Gudmundur Thorisson (CSHL)
- Benoit Ballester (Universite de la Mediterranee)
- James Smith (Ensembl)
- Arne Stabenau (Ensembl)
- Andreas Kahari (Ensembl)
- Craig Melsopp (Ensembl)
- Katerina Tzouvara (EBI)
- Paul Donlon (Unilever)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)