GERMINATE A Plant Data Management System - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

GERMINATE A Plant Data Management System

Description:

melongena. Solanum. 1448. CGN17571. NLD037. TUR. 19960109. Yuvorlak Patlican; PI 167373. eggplant ... melongena. Solanum. 9894. CGN18606. NLD037. TUR. 19960617 ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 26

Provided by: jenni236

Category:

more less

Transcript and Presenter's Notes

Title: GERMINATE A Plant Data Management System

1
GERMINATEA Plant Data Management System

Jennifer Lee (UoD)
Guy Davenport (JIC)
Andy Flavell (UoD)
Dave Marshall (SCRI)
Robbie Waugh (SCRI)
Jo Dicks (JIC)
Noel Ellis (JIC)
Mike Ambrose (JIC)
Theo van Hintum (CGN)

2
GERMINATE Project

http//bioinf.scri.sari.ac.uk/germinate/
Developed in PostgreSQL
Expected to be released under the GNU public
license open source
Core of the system is based on FAO/IPGRI multi
crop passport descriptors
Additional generic tables to accommodate other
data types.

3
GERMINATE Project
4
The Database

Designed to potentially hold any type of data
associated with plants.
Accession is the entry level to all data. This
permits complex queries using vastly different
dataset.
Comments related to any item in the database can
be stored and accessed easily.

5
PK Primary Key FK Foreign Key U Unique
Index I Index
6
PK Primary Key FK Foreign Key U Unique
Index I Index
7
Loading Data

Most data can be loaded directly from Excel
spreadsheets or MS Access tables.
The data is either copied directly into the
database (MS Access tables) or modified by a Perl
script (for Excel spreadsheets) from a standard
format. The data is then moved into the
appropriate tables.

8
(No Transcript)
9
Passport Data
10
Passport Data
11
PK Primary Key FK Foreign Key U Unique
Index I Index
12
Some Curation Necessary
Taxonomy example
13
Collecting Example
Example Accession CGN03335
AccessionCollecting
Collecting
CollectingSites
Collecting and CollectingSites are not updated.
AccessionCollecting is if accession_id and
collecting_id match.
14
Curation

Pre-processing interface to allow users to see
what information is in the database before they
submit their data.
Web based curation tool.
Developed at Iowa State University
Submitted data must be validated before it is
available to the public.

15
Data Interaction
PK Primary Key FK Foreign Key U Unique
Index I Index
16
Data Association Levels
PK Primary Key FK Foreign Key U Unique
Index I Index
17
Genetic Data

Data stored as 2D array
Accessions as dimension 0
Markers as dimension 1
Data is stored as an integer and decoded in the
EnumUnits and EnumUnitsArrays table
We are using an Allele Index approach which will
simplify queries significantly in cases where the
marker type is not important, but only the
relative allele values are required.
Experiment information includes author, date, an
experiment name and brief description.
Method information includes a name and
description (In general more general than
experiment, multiple experiments are expected to
use the same method.) and a link to the units.
Units indicate the marker type.

18
Database
Genetic Data in GERMINATE
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension1
dimension0
data (integer data)
Original Data
accessions (reference data)
markers (string data)
integer_data (enum_index)
dataset_id
index_id
string_data
dataset_id
index_id
reference_id
dataset_id
index_id
table_id
5 -gt Accessions table reference_id accession_id
Accessions
Datasets
Metadatasets
accession_id
germinate_id
instcode
accenumb
experiment_id
metadataset_id
data_type_id
dataset_id
method_id
dimension
dataset_id
dataset_discription
size
19
Decoding the Allele Index
enum_table_array_text
Relative comparison
enum_table_array_int
allele_index
enum_index
allele_value
text
unit_id
enum_index
allele_value
enum_index
enum_value
int
array
unit 7 data_value
unit 8 data_value
unit 9 data_value
unit_id
enum_index
enum_value
array

Allows expansion of allele values to any ploidy
level.
Match of enum_index between unit types do not
indicate match in allele phase.
Much more efficient than storing the allele
values as allele, while still permitting
searching of individual alleles in any polyploidy
level.
If marker type is not important in query the
methods, units and enum tables can be bypassed in
a join query to speed up queries.

20
Genetic Map Data

3 sets of data
Population
Stored in Pedigree table, reference to
individuals in reference table which links
population to the dataset.
Data used to create map
Stored similar to genetic data
Map

21
Genetic Map Data
Original Data
metadataset 2
metadataset 3
dataset 1
dataset 1
dataset 1
dimension0
dimension1
loci
position
linkage groups
string data
real data
string data
string_data
dataset_id
dataset_id
real_data
dataset_id
index_id
index_id
string_id
index_id
Additional information can be stored as added
dimensions to the dataset.
22
Image data

Store either images or links to images for access
from an interface.
We should be able to map images
For example in the case of microarray images the
spots can be mapped to the accession/sample and
bring up information on that accession.

23
(No Transcript)
24
Interface

Currently we have a light weight Perl-CGI
interface
Working towards a more flexible interface that
would allow complex query formation from users.
Return results as objects will allow navigation
through data without searching again.

25
Acknowledgments