Title: GERMINATE A Plant Data Management System
1GERMINATEA Plant Data Management System
- Jennifer Lee (UoD)
- Guy Davenport (JIC)
- Andy Flavell (UoD)
- Dave Marshall (SCRI)
- Robbie Waugh (SCRI)
- Jo Dicks (JIC)
- Noel Ellis (JIC)
- Mike Ambrose (JIC)
- Theo van Hintum (CGN)
2GERMINATE Project
- http//bioinf.scri.sari.ac.uk/germinate/
- Developed in PostgreSQL
- Expected to be released under the GNU public
license open source - Core of the system is based on FAO/IPGRI multi
crop passport descriptors - Additional generic tables to accommodate other
data types.
3GERMINATE Project
4The Database
- Designed to potentially hold any type of data
associated with plants. - Accession is the entry level to all data. This
permits complex queries using vastly different
dataset. - Comments related to any item in the database can
be stored and accessed easily.
5PK Primary Key FK Foreign Key U Unique
Index I Index
6PK Primary Key FK Foreign Key U Unique
Index I Index
7Loading Data
- Most data can be loaded directly from Excel
spreadsheets or MS Access tables. - The data is either copied directly into the
database (MS Access tables) or modified by a Perl
script (for Excel spreadsheets) from a standard
format. The data is then moved into the
appropriate tables.
8(No Transcript)
9Passport Data
10Passport Data
11PK Primary Key FK Foreign Key U Unique
Index I Index
12Some Curation Necessary
Taxonomy example
13Collecting Example
Example Accession CGN03335
AccessionCollecting
Collecting
CollectingSites
Collecting and CollectingSites are not updated.
AccessionCollecting is if accession_id and
collecting_id match.
14Curation
- Pre-processing interface to allow users to see
what information is in the database before they
submit their data. - Web based curation tool.
- Developed at Iowa State University
- Submitted data must be validated before it is
available to the public.
15Data Interaction
PK Primary Key FK Foreign Key U Unique
Index I Index
16Data Association Levels
PK Primary Key FK Foreign Key U Unique
Index I Index
17Genetic Data
- Data stored as 2D array
- Accessions as dimension 0
- Markers as dimension 1
- Data is stored as an integer and decoded in the
EnumUnits and EnumUnitsArrays table - We are using an Allele Index approach which will
simplify queries significantly in cases where the
marker type is not important, but only the
relative allele values are required. - Experiment information includes author, date, an
experiment name and brief description. - Method information includes a name and
description (In general more general than
experiment, multiple experiments are expected to
use the same method.) and a link to the units. - Units indicate the marker type.
18Database
Genetic Data in GERMINATE
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension1
dimension0
data (integer data)
Original Data
accessions (reference data)
markers (string data)
integer_data (enum_index)
dataset_id
index_id
string_data
dataset_id
index_id
reference_id
dataset_id
index_id
table_id
5 -gt Accessions table reference_id accession_id
Accessions
Datasets
Metadatasets
accession_id
germinate_id
instcode
accenumb
experiment_id
metadataset_id
data_type_id
dataset_id
method_id
dimension
dataset_id
dataset_discription
size
19Decoding the Allele Index
enum_table_array_text
Relative comparison
enum_table_array_int
allele_index
enum_index
allele_value
text
unit_id
enum_index
allele_value
enum_index
enum_value
int
array
unit 7 data_value
unit 8 data_value
unit 9 data_value
unit_id
enum_index
enum_value
array
- Allows expansion of allele values to any ploidy
level. - Match of enum_index between unit types do not
indicate match in allele phase. - Much more efficient than storing the allele
values as allele, while still permitting
searching of individual alleles in any polyploidy
level. - If marker type is not important in query the
methods, units and enum tables can be bypassed in
a join query to speed up queries.
20Genetic Map Data
- 3 sets of data
- Population
- Stored in Pedigree table, reference to
individuals in reference table which links
population to the dataset. - Data used to create map
- Stored similar to genetic data
- Map
21Genetic Map Data
Original Data
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension0
dimension1
loci
position
linkage groups
string data
real data
string data
string_data
dataset_id
dataset_id
real_data
dataset_id
index_id
index_id
string_id
index_id
Additional information can be stored as added
dimensions to the dataset.
22Image data
- Store either images or links to images for access
from an interface. - We should be able to map images
- For example in the case of microarray images the
spots can be mapped to the accession/sample and
bring up information on that accession.
23(No Transcript)
24Interface
- Currently we have a light weight Perl-CGI
interface - Working towards a more flexible interface that
would allow complex query formation from users. - Return results as objects will allow navigation
through data without searching again.
25Acknowledgments
- University of Dundee
- Andy Flavell
- Scottish Crop Research Institute (SCRI)
- David Marshall
- Robbie Waugh
- Centre for Genetic Resources, The Netherlands
(CGN) - Theo van Hintum
- John Innes Centre (JIC)
- Guy Davenport
- Jo Dicks
- Mike Ambrose
- Noel Ellis
- Funding
- BBSRC Grant 94/BEP17084, the Bioinformatics and
E-science program