Title: Creating An Allele Index For NPGS: Bioinformatic Issues
1Creating An Allele Index For NPGS Bioinformatic
Issues
- Edward Buckler
- USDA-ARS at Cornell University, Ithaca, NY
2- AIM Make more useful plants by conserving,
finding and combining better alleles.
NEED The National Germplasm conserves 464,000
accessions and may contain 100,000,000 distinct
alleles, but there is no index.
3Genetic mapping is the basis of the index, and
QTL mapping approaches now exist for virtually
all types of populations.
- Near gene level resolution achieved in multiple
species - Identification of genes controlling flowering,
starch, nutrients, wood quality - Positive Results in
- Maize
- Rice
- Arabidopsis
- Conifers
4What needs to happen?
Genotyping (0.5Mdp per accession)
Phenotyping (500dp per accession)
Bioinformatics (GRIN)
Mapping Tools
Breeder Decision Tools
5What data is currently available outside NPGS?
- Several large NSF Plant Genome projects on
diversity with NPGS germplasm at the heart of
these projects - Numerous smaller projects (however, most data
gets lost over time from these) - Millions of genotypic and phenotypic data points
in just maize, wheat, and rice projects. - Database aware analysis tools (eg. TASSEL)
6Alignment SNP Display
Panzea Web Data Access
Upload Tools
Display
GDPDM Gramene Panzea (Maize) Rice Evol.
GRIN?
GDPC Data Browser
GDPC
Other Analysis Tools
TASSEL
Germinate
GRIN
DBs
Middleware
Analysis
7- GDPDM
- Germplasm
- Genotype
- Phenotype
- Environment
- Used by maize, wheat, and rice diversity projects.
8Alignment SNP Display
Panzea Web Data Access
Upload Tools
Display
GDPDM Gramene Panzea (Maize) Rice Evol.
GDPC Data Browser
GDPC
Other Analysis Tools
TASSEL
Germinate
GRIN
DBs
Middleware
Analysis
9Purpose
The purpose of GDPC is to simplify access to the
large genomic and phenotypic datasets that are
becoming available in plant biology.
www.maizegenetics.net/gdpc
10GDPC Data Flow Diagram
www.maizegenetics.net/gdpc
11GDPC Data Flow Diagram
www.maizegenetics.net/gdpc
12Databases
- Where has GDPC been mapped?
- Panzea (GDPDM schema)
- Gramene (GDPDM)
- Germinate (generic schema)
- GRIN (passport data)
www.maizegenetics.net/gdpc
13GDPC Select Data Service
14GDPC Select Taxa
15GDPC Nucleotide Data
16GDPC Trait Data
17 GDPC Browser Demo
GDPC Marker Data
18Current GDPC Limitations
- XML is not efficient for large datasets
- Several avenues are possible for improving
efficiency - More visualization and analysis tools need to be
developed - Linkage Mapping
- Breeder Decision Tools
- Geographic interfaces
- Pedigree Interfaces (in progress)
19What should GRIN consider?
- Becoming the lead repository for genotypic and
phenotypic diversity data - Lead efforts for the consolidation of community
diversity data - Implement several middleware or web services
standards (eg. GDPC and perhaps others IRRI) - Collaborate on the development of data
visualization tools
20All of the software can be accessed through
www.maizegenetics.netwww.sourceforge.netwww.pan
zea.orgwww.gramene.org