Title: Archiviazione e integrazione dei dati di biodiversit: lesperienza MBLab
1Archiviazione e integrazione dei dati di
biodiversità lesperienza MBLab
Molecular Biodiversity Laboratory
Istituto di TecnologieBiomediche (Bari)
Biodiversità Molecolare concetti di base,
tecnologie, applicazioni Roma, 10 Luglio 2009
2Molecular Biodiversty Laboratory
MBLab is a private-public research initiative,
involved in the study and the use of Molecular
Biodiversity It aims to build novel
bioinformatic systems, both applied to human
health in order to monitor safety and risks -
and to agro-industrial, to trace products along
food production and supply chains
Project co-funded by Italian Minister of Research
(MIUR) and has 7M budget Fondo FAR - Legge
297/1999 Art. 12/lab Project Grant DM19410
- Organization
- Lab Director is Luigi Di Pace (IBM)
- Scientific Board
- Prof. Annamaria Colao (University Federico II
of Napoli) - Prof. Floriana Esposito (University of Bari)
- Dott. Pietro Leo (IBM)
- Prof. Graziano Pesole (University of Bari)
- Prof. Cecilia Saccone (CNR-ITB University of
Bari) - Dott. Angelo Visconti (CNR-ISPA)
Ministero dellistruzione delluniversità e della
ricerca
Provincia autonoma di Trento
3Molecular Biodiversity Lab partnership and
StrategyMixing and balancing public and private
resources, in an Open Research Collaboration
style, to build an operative research network to
provide high-value services and products based on
molecular biodiversity
4The Molecular Biodiversity Lab aims to develop a
number of molecular diagnostic tools and
bioinformatics resources targeting both human
safety and agricultural applications, by
exploiting a number of molecular biodiversity
techniques and know-how
Molecular Biodiversity
DNA Barcode
Molecular diagnostic tools and resources
5High level overview of the MBLab IT Platform
(component model)
non-grid env.
grid env.
Gene meta-engine interface
Semantic search engine
Advanced Query System
Data submission update System
Applications
Workflow designer
Semantic Indexer
Gene search meta-engine
New bioinformatic algorithms
Text Analytics algorithms
Semantic Annotator
Data Mining tools
Open bioinformatic algorithms
Dictionaries
Biodiversity Ontology
Infrastructure
Data Update Engine
Data Federator
Data Service Hub
Semantic Indexes
MBLab Data Warehouse
Biodiversity related data
Literature
Genomic
Taxonomic
Collections /Biobanks
Genomic
Taxonomic
Bio Images
PRIVATE
PUBLIC
6Experiment management tools AFLP Gel
elettrophoresis patterns Eletropherograms
Semantic search engine Ontology
Bioinformatics analytics tools Cluster
Analysis Simple and Multiple alignment Polymorphis
m analysis Phylogenetics clustering
Applications
Collections
MBLAB Integrated Database
Literature
Taxonomy
Metagenomics
GeoSpatial
Collections
Camera
UNIBA ITB
IGV
ISPA IVV IGV
Sequences
Annotations
Phenotype
Expression
MolecularData
ISPA IVV
ISPA IVV
ISPA IVV IGV
ISPA IVV
cpnDB
ITIS
Broad
TAIR
GBIF
Species2000
TIGR
GDR
ShiBase
A.Oryzae ESTdb
GrapeWine Genomics
MIPS
LBSN
JGI
VFDB
NCBI / EMBL
MLST
TrED
Public
7High-Level Conceptual Schema
- STANDARDS
- BOLD (Barcode Of Life Data System) schema to
interrelate collection and taxonomic data. - MolecularData section inspired to Chado schema
that underlies many GMOD (Generic Model Organism
Database) application, a collection of open
source software tools for creating and managing
genome-scale biological databases. - It uses the SequenceOntology (Open Biomedical
Ontologies) to define and relate all the
annotations and attributes onto sequences
(features) - Geospatial Data use of NUTS (Nomenclature of the
Territorial Units for Statistics) to resolve the
toponyms coordinates
Relazione Semplice
Generalizzazione
Relazione Molti a Molti
8Integration/Federation of private Collections/DBs
Integration/Federation Layer
New Collections/DBs
Virus IVV-DB
Italian Biodiversity Network
9Italian Biodiversity Network still an informal
entity
- Waiting for the official National Network of
Biodiversity, several Italian groups are
interested to share their data in an integrated
system like MBLab. - Involved institutions
- Osservatorio della Biodiversità del Lazio
(observations and sequences from Metazoa in
Latium) - UniRoma2 and UniRoma1 - ZooPlant Lab (DBs, collections and sequences on
Filaria and free nematodes, and Italian bats and
birds) - U.Bicocca - ForBol (DBs, collections, and sequences on Forest
plants of mediterranean basin) U. Tuscia,
UniFi, UniCal, UniBa
10ETL Integrating GenBank in the MBLAB Database
flat file
sequence and annotation properties
Parser
sequence nucleotides "gtatggg..ttctaga... he
ader locus "HSU50398 length
"1191... reference authors Erzurum....
title ...... ......
Reasoner for Ordered Sql
GenBank
INSERT INTO SEQUENCE... INSERT INTO SEQUENCE
FEATURE.. INSERT INTO REFERENCE... INSERT INTO
REFERENCE PORTION...
MBLab
DB Filler
ordered sql statements
11MBLab private collections
12Private Molecular Data collected in CNR-ISPA
(until April 2009)
1311.113 accession
Private Collection Data in CNR-IGV
Biodiversity data of cultivated plant species and
their wild progenitors
14MBLab integrated biodiversity database TEAM
- ITB (CNR)
- Flavio Licciulli (flavio.licciulli_at_ba.itb.cnr.it)
- Domenica DElia
- Andreas Gisel
- Saverio Vicario
- IBM
- Gaetano Scioscia
- Graziano Pappadà
- Giovanni Esposito
- Paolo Pannarale
- ISPA (CNR)
- Antonella Susca
- Giuseppina Mulè
- IGV (CNR)
- Domenico Catalano
Thanks !!!