What do we need the data for

About This Presentation

Title:

What do we need the data for

Description:

The types of data will impact the infrastructure requirements, the types of data ... What are the showstoppers? Integration (first priority) Lack of algorithms ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 13

Provided by: teren90

Category:

more less

Transcript and Presenter's Notes

Title: What do we need the data for

1
What do we need the data for

Rigorous needs definition will drive the
infrastructure requirements
The types of data will impact the infrastructure
requirements, the types of data are driven by the
requirements
Draft sequence vs completely finished seq
Requires different tools for support
Our guess is we need completely finished sequence
to validate models
What do the biologists think?
Will this change over time?

2
Guiding Principles

Need a new paradigm on data ownership
Policies should be established up front
Data owned by worldwide community
Heirarchies of data
All of the trace data is not required to be
released, only the summary data
Need to decide on archival policy
Is it easier to regenerate data vs going back to
trace data?
Treat integration of data as a separate problem
Conceptually centralized integration repository

3
Guiding Principles

Need to define data interfaces
Up front (before the program is announces) we
need
The box
XML? OIL? XML Schema?
Pick one, box is less important
Allows us to be internally consistent
Does not constrain internal reps

4
Guiding Principles

Need to define data interfaces
Up front (before the program is announces) we
need
The mechanism for filling the box
Structures need to be able to evolve over time in
an organized, but fast way
Can leverage existing tools and infrastructure
and standards being developed by the individual
communities

5
Guiding Principles

Need to have translator capabilities intrinsic in
the infrastructure
Allows us to tie in to external data
Increases value to community at large

6
Guiding Principle

Success of the project will be judged by how well
the project both is accepted by and serves the
community at large including those groups
beyond the walls of DOE

7
Where do we need investments

Integrated databases
New and improved algorithms
Need to leverage tools and intellectual output of
SciDAC and other efforts in
Collaborative computing environments
Scientific visualization

8
What are the showstoppers?

Integration (first priority)
Lack of algorithms
Current algorithms arent necessarily applicable
Integrated data offers lots of opportunities for
improved accuracy / new algorithms
Data analysis
Specialized data mining algorithms

9
Recommendations

Address data integration problems now
Make high performance computing resources
available for computational biology
Develop tools that allow biologists to perform
inference
Ability to frame questions in an intuitive way
Comparison and analysis capabilities
Example based queries

10
Recommendations

Realize that a lot of this is the application of
existing CS / Math /Stats techniques, and does
not necessarily require research in these
disciplines
There is interesting CS work here, just not all
of it is research (although some is)
Will not get funded under CS grants
Impact on who should be working on the problems
The National Labs are good at this
interdisciplinary type of work

11
A New Synthesis between Computing and
BiologyLaying a Foundation for Understanding
Higher Levels of Biological Complexity
Ecological Processes and Populations
Tissue and Organismal Physiology
Cellular Developmental Processes
Functional and Structural BioComplexity
Biochemical Pathways Processes
Function-Structure Relationships
Gene Regulation Pathways
Comprehensive Genome-based Analysis
Gene Expression Networks
Comparitive Protein Analysis
Phylogeny Reconstruction
Comparitive Sequence Analysis
Genome Comparisons and Synteny
Protein Structure Modeling
Gene Structure Prediction
Protein Sequence Prediction
Gene and Feature Identification
Genome Assembly
Computing and Information Requirements
12
Biological Data

Write a Comment

User Comments (0)