CCEGA VisionDan Reed - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

CCEGA VisionDan Reed

Description:

develop a prototype informatics infrastructure. data models, methods, tools and portals ... biomedical and IT researchers. software developers. National ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 21
Provided by: ncsa9
Category:
Tags: ccega | visiondan | reed

less

Transcript and Presenter's Notes

Title: CCEGA VisionDan Reed


1
Carolina Center for ExploratoryGenetic
AnalysisIntroduction and Context
  • Dan Reed
  • Dan_Reed_at_unc.edu
  • Chancellors Eminent Professor
  • Director, Renaissance Computing Institute
  • University of North Carolina at Chapel Hill

Supported in part by NIH Grant 5P20RR020751-02
2
Genetics and Disease Susceptibility
Phenotype 1 Phenotype 2 Phenotype 3
Phenotype 4
Ancestry Environment
Age Gender
Identify Genes
Pharmacokinetics
Metabolism
Endocrine
Biomarker Signatures
Physiology
Proteome
Transcriptome
Immune
Morphometrics
Predictive Disease Susceptibility
Source David Threadgill/Terry Magnuson
3
The Data Wave
  • Many sources
  • sequencing
  • microarrays
  • environmental
  • public health
  • family studies
  • clinical
  • Many technology enablers
  • increased detector resolution
  • increased storage capability
  • The challenges
  • managing complexity
  • tracking knowledge
  • extracting insight

We Are Here!
4
Data Heterogeneity and Complexity
Genomic, proteomic, transcriptomic, metabolomic,
protein-protein interactions, regulatory
bio-networks, alignments, disease, patterns and
motifs, protein structure, protein
classifications, specialist proteins (enzymes,
receptors),
Proteome
Source Carole Goble (Manchester)
5
Convergence and Opportunity
  • Center for Genome Sciences (CCGS)
  • ten year investment of 245M
  • new center and department
  • 4 buildings and 22 faculty lines
  • advanced facilities and equipment
  • participation by multiple schools and departments
  • 25M anonymous gift for proteomics
  • Renaissance Computing Institute (RENCI)
  • interdisciplinary applications of computing
  • faculty, staff and student collaborations
  • new infrastructure and capabilities
  • technology transfer and economic development

6
Information Visualization Challenges
  • Heterogeneity and complexity
  • data types
  • numerical, non-numerical
  • textual, graphical,
  • sources and scale
  • distributed databases, conferences
  • journals, web pages,
  • ontology and relationships
  • metadata, meanings and connections
  • Compatibility and collaboration
  • from desktop to distributed and collaborative
  • familiar tools and interfaces
  • UNC tiled display wall deployment
  • Health Sciences Library/RENCI collaboration
  • HSL focus on fostering collaboration
  • building renovation and redesign

7
RENCI/Health Sciences Collaboration
June 2005
8
PITAC Data and Software Repositories
  • Findings
  • Explosive growth in sensors and scientific
    instruments has engendered unprecedented volumes
    of data, presenting historic opportunities for
    major scientific breakthroughs in the 21st
    century
  • Computational science now encompasses modeling
    and simulation using data from these and other
    sources, requiring data management, mining, and
    interrogation
  • Recommendations
  • Federal government must provide long-term support
    for computational science community data
    repositories
  • defined frameworks, metadata structures
  • algorithms, data sets, applications
  • review and validation infrastructure
  • Government must require funded researchers to
    deposit their data and research software in these
    repositories or with access providers that
    respect any necessary or appropriate security
    and/or privacy requirements

9
Deep Carolina (Proposed)
  • Features
  • five year partnership timeline (minimum)
  • RTP/UNC IBM anchors
  • proximity facilitates collaboration
  • joint faculty/staff participation
  • leading edge computing infrastructure
  • hardware and software
  • Rationale
  • leverage IBM and Triangle resources
  • develop and evaluate new technologies
  • explore applications of computing to new problems
  • Joint resource commitments
  • Carolina and IBM

10
CCEGA Project Goals
  • Develop collaborative experiences and plans
  • preliminary data to apply for a P50 grant
  • Deliverables and activities
  • develop a protocol for prospective studies
  • using ongoing studies as examples to define best
    practices
  • Carolina Cohort
  • develop a prototype informatics infrastructure
  • data models, methods, tools and portals
  • demonstrate the utility of data mining
  • applied to established project(s)
  • facilitate use of best practices for existing
    projects
  • develop an environment for cross training and
    education
  • formal and informal education touching project
    participants and trainees
  • Foster mutual awareness and shared needs

Supported in part by NIH Grant 5P20RR020751-02
11
CCEGA Vision
Interoperable Data Management
Faculty, Staff Students
Driving Problems
Promoting Mutual Awareness
Experimental Genetics Portal
Analysis Techniques
Statistical Computational Techniques
Extant Data Models
Virtuous Cycle
Interdisciplinary Research Education
12
Tentative Science Requirements
  • Integrated storage, analysis and exploration
  • reusable infrastructure and shared capability
  • Shared collaborative infrastructure
  • new science and larger collaborations
  • Leverage from other infrastructure
  • distributed resource sharing and use
  • Simplicity, simplicity, simplicity
  • reduce redundant infrastructure construction
  • focus time and talent on research

13
CCEGA Participants
  • Coordination team
  • Dan Reed, RENCI
  • Terry Magnuson, CCGS
  • Alan Blatecky, RENCI
  • Kirk Wilhelmsen, CCGS
  • Eleven departments/institutes
  • Biostatistics
  • Cancer Center
  • CCGS
  • Computer Science
  • Epidemiology
  • Genetics
  • Health Science Library
  • Information and Library Science
  • Pharmacy
  • RENCI
  • Statistics
  • Campus wide support
  • from many sources
  • Project participants
  • Brad Hemminger, Information Library Science
  • James Evans, Genetics
  • Kevin Gamiel, RENCI
  • Xiaojun Guan, RENCI
  • Barrie Hays, Health Science Library
  • Clark Jefferies, RENCI
  • Ethan Lange, Genetics
  • Andrew Nobel, Statistics
  • Karen Mohlke, Genetics
  • Kari North, Epidemiology
  • Susan Paulsen, Computer Science
  • Fernando Manuel Pardo, Genetics
  • Charles Perou, Cancer Center
  • Lavanya Ramakrishnan, RENCI
  • Jan Prins, Computer Science
  • Patrick Sullivan, Genetics
  • Lisa Susswein, Cancer Center
  • David Threadgill, Genetics

14
Formal CCEGA Activities
  • Workshops
  • genetics and disease
  • analysis methods (today)
  • Cross-disciplinary tutorials
  • genotyping
  • XML
  • others to come
  • Working groups
  • ELSI, analysis and informatics
  • Software prototyping
  • portal and data model planning
  • Management group
  • planning and strategy

www.renci.org/P20
15
CCEGA Working Groups/Structures
  • ELSI
  • IRB and coordinated data sharing
  • James Evans, Genetics (lead)
  • Exploratory analysis
  • data mining and classification techniques
  • Jan Prins, Computer Science (lead)
  • Informatics
  • LIMS, data models and representations
  • Brad Hemminger, Information and Library Science
    (lead)
  • Integration and prototyping
  • portals, software and tools
  • Xiaojun Guan, RENCI (lead)
  • Organization and operation
  • weekly meetings with posted topics
  • web summaries for project access
  • www.renci.org/P20

16
Infrastructure and Data
  • Bioinformatics portal
  • standard tools and community interfaces
  • data integration and access
  • Large scale visualization and collaboration
  • tiled display wall and tools
  • Strawman data models
  • discussion, data validation and tool development
  • Simulated case control data sets
  • no genotype/phenotype connection (null data)
  • phenotype via simulated development process

17
North Carolina Bioportal
  • Goals and features
  • standard interfaces
  • common tools and databases
  • extensibility mechanisms
  • new tools, techniques and data
  • authentication and security
  • controlled access
  • local and remote access
  • national coupling and sharing
  • Currently
  • 100 standard applications
  • Emboss, Glimmer, Hmmer
  • NCBI, Phylip, other,
  • growing suite of databases
  • NCBI Blast, GenBank, GenPept
  • PDB, Prints, rebase, Repbase
  • Uniprot, Fasta, genomes, Pfam
  • Prosite, Refseq, Transfac, WU Blast
  • May 2005 initial release

18
(No Transcript)
19
North Carolina Bioportal
Users
Account Management
BioPortal
MySQL databases
Grid Gatekeeper
MyProxy
GridFTP
OpenPBS
Applications
Application Databases
Pise
  • Open Grid Computing Environment (OGCE)
  • shared development
  • standard web services
  • adopting portal standards (JSR168)
  • used by cyberinfrastructure projects
  • LEAD, NEES, PACI, DOE, TeraGrid

Local cluster
20
Our Vision of Success
  • Local avatars for the national community
  • driving problems and experiences
  • infrastructure testing and validation
  • Multidisciplinary collaboration
  • biomedical and IT researchers
  • software developers
  • National infrastructure and communities
  • distributed and federated
  • customizable to local needs
  • interoperable and shared
  • The Virtual Observatory astronomy model
  • standard tools
  • metadata and data models
  • virtual community

21
Next Kirk Wilhelmsen
Write a Comment
User Comments (0)
About PowerShow.com