PGF Raw Data Organization - PowerPoint PPT Presentation

About This Presentation
Title:

PGF Raw Data Organization

Description:

... by scientific fields that were not anticipated by the Human Genome Project ... None of this was contemplated' in the Human Project ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 4
Provided by: alic141
Learn more at: https://www.csm.ornl.gov
Category:
Tags: pgf | data | organization | raw

less

Transcript and Presenter's Notes

Title: PGF Raw Data Organization


1
PGF Raw Data Organization
Project Series of Libraries that define a
genome Library Series of Plates Plate 384
Clones Clone 2 Lanes 1 Lane 1MB
each distributed into 4 files 1 FASTA file
1KB 1 scf file 50KB 1 abd
file 250KB 1 rsd/ab1file 650KB In
May-03, PGF ran 2.5 million successful lanes
2.5TB/month 10 million files
(0.75TB/month (9 TB/year) non-trace files)
This does not include any assembly, database or
metadata!
2
Community Access to PGF Data
  • Access to these data is in demand by scientific
    fields that were not anticipated by the Human
    Genome Project
  • Microbiologists
  • Environmental Scientists BioGeologists
  • Evolutionary Scientists
  • GtL projects
  • Not everyone will want the same kind of files.
  • The computational sophistication of the user
  • community is uneven, at best.

3
Data Organization Requirements
1. Metadata for the files being collected
-- schema definition development -- the
database system to support the metadata --
query interfaces to query the metadata --
possible rapid prototyping using the object based
tools 2. Data entry tools for the metadata
-- procedure to enforce metadata entry --
checks on the correctness of the metadata entered
None of this was contemplated in the Human
Project but is essential for JGI and GTL data
management
Write a Comment
User Comments (0)
About PowerShow.com