Title: Future Challenges in Bioinformatics
1Future Challenges in Bioinformatics
2Introduction
- Introduction How RRX got involved
- Life sciences context How bioinformatics came to
be important - The past half century How bioinformatics has
evolved
3Introduction
- Categories of Bioinformatics Tools
- Why We Need Supercomputers
- Software Development Issues
- Future Challenges
- Tools for Biotech Projects
- Summary
4How RRX got involved
- Submitted a Canadian Foundation for Innovation
(CFI) proposal for Advanced Bioinformatics
Collaborative Computing (ABioCC)
5How RRX got involved
- Developed an SVG based visualization front end
- Paper will be presented at SVG Open 2003 in
Vancouver on July 17th
6How bioinformatics came to be important
- After the structure of DNA was reverse engineered
with X-Ray diffraction in 1953 focus shifted to
nucleic acid sequence analysis - DNA/RNA/protein sequence data accumulated using
computer programs for storage and analysis
7How bioinformatics came to be important
- Bioinformatics algorithms in development for the
last half century came into wide spread use by
researchers - The ability to compare sequences created a
homology context for unknown sequences of
interest leading to advances
8How bioinformatics came to be important
- Improved sequencing technology enabled the
complete deciphering of the human genome gtgtgt 1999 - About 3.18 billion base pairs
- Celera used 300 PE Biosystems ABI Prism 3700 DNA
Analysers
9How bioinformatics has evolved
- Central dogma of molecular biology
- DNA sequences are transcribed into mRNA
sequences, mRNA sequences are translated into
protein sequences, which fold 3D creating
structures with functions statistically survival
selected gtgtgt affecting the prevalence of the
underlying DNA sequences in a population
10How bioinformatics has evolved
- This created a supporting information flow
- Organization and control of genes in the DNA
sequence - Identification of transcriptional units in the
DNA sequence - Prediction of protein structure from sequence
- Analysis of molecular function
11How bioinformatics has evolved
- Another covariant information flow was created
based on the scientific method - Create hypothesis wrt biological activity
- Design experiments to test the hypothesis
- Evaluate resulting data for compatibility with
the hypothesis - Extend/modify hypothesis in response
12How bioinformatics has evolved
- IT used to handle explosion of data from high
throughput techniques, too complex for manual
analysis - X-ray diffraction
13How bioinformatics has evolved
- Automated DNA sequencing
- Amersham Biosciences
- Applied Biosystems
- Beckman Coulter
- LI-COR
- SpectruMedix Corp.
- Visible Genetics Corp.
14How bioinformatics has evolved
- Microarray expression analysis
15How bioinformatics has evolved
- Rapid emergence of 3D macromolecular structure
databases - New sub discipline structural bioinformatics
- Atomic and sub cellular spatial scales
- Representation/physics
- Storage/retrieval/source data correlation/interpre
tation - Analysis/simulation
- Display/visualization
16How bioinformatics has evolved
17Categories of Bioinformatics Tools
- Databases gtgtgt search/compare
- Sequence Analysis - Clusters
- Genomics
- Phylogenics
- Structure Prediction
- Molecular Modelling
- Microarrays
- Packages, Misc Apps, Graphics, Scripts
18Categories of Bioinformatics Tools
- Database gtgtgt search/compare
- aceperl
- BLAST
- Blastall
- Blastpgp
- BLAT
- Blimps
- Entrez
- FASTA
- fastacmd
- formatdb
- getz
- HMMER
- IMPALA
- InterProScan
- PHI-BLAST
- ProSearch
- PSI-BLAST
- PSI-BLASTN
- Seguin
- Swat
- tace
- xace
19Sequence Analysis
- Artemis
- Bl2seq
- BLAST
- Clustal W, X
- consed/autofinish
- Cross_match
- Dotter
- EMBOSS
- FASTA
- Glimmer
- HMMER
- InterProScan
- MEME
- View
- Paracel Transcript Assem
- Phrap
- Phred
- Primers
- ProSearch
- Readseq2
- Rnabob
- RRTree
- SAPS
- seals
- Seqsblast
- STADEN
- Swat
- T-Coffee
20Genomics
- Calc_primers
- Cross_match
- FPC
- GENSCAN
- Glimmer
- Image
- Mzef
- Phrap
- Phred
- STADEN
- Swat
- tace
- tace_celegans
- tRNAscan-SE
- xace
- xace_celegans
21Phylogenics
- Clustal W
- Clustal X
- MOLPHY
- MrBayes
- PHYLIP
- RRTree
- T-Coffee
- TREE-PUZZLE
- TreeViewX
22Structure Prediction
- EMBOSS
- MEME
- Modeller
- Mzef
- PHI-BLAST
23Molecular Modelling
- Modeller
- homology modeling an alignment of a sequence to
be modeled with known related structures - Rasmol
- a molecular graphics program intended for 3D
visualisation of proteins and nucleic acids - Raster3D (publishing images)
- X3DNA
- analyzing and rebuilding 3D structures
24Microarrays
- Dapple
- a program for quantitating spots on a two-colour
DNA microarray image.. - OligoArray
- a program that computes gene specific
oligonucleotides that are free of secondary
structure for genome-scale oligonucleotide
microarray construction.
25Packages, Useful Scripts/Source Code, Graphics,
PERL
- BioPERL
- BioJava
- boxshade
- mvscf
- seg
- Split_fasta
26Why We Need Supercomputers
- Some commercial packages run on supercomputers
- Accelrys modeling and simulation
- Materials Studio
- Cerius2 (SGI Unix only)
- Homology modeling to catalyst design
- Insight II (SGI Unix only)
- 3D graphical environment for physics based
molecular modeling - Catalyst (high end Unix servers)
- database management valuable in drug discovery
research - QUANTA (high end Unix servers)
- crystallographic 2D/3D protein structure solution
- Discovery Studio
27Why We Need Supercomputers
- Supercomputer advantages
- Multiple processors
- Large shared memory
- Handle very large files
- Large/fast RAID arrays
- Terabyte tape backup systems
- Power backup systems
- High performance networks
28Why We Need Supercomputers
- Common bioinformatics requirements
- Computationally intensive tasks
- Large memory models
- Intensive/complex database searches
- Large experimental database sets
- Large derived database sets
- Large persistent intermediate data structures
- Teamwork data sharing and visualization
29Why We Need Supercomputers
- Network requirements
- Driving gigE/10gigE NICs
- Moving large files/data sets rapidly
- Visualization streams/Access GRID
- Coordinating Cluster/GRID computing
- Dynamic provisioning of light paths
30Why We Need Supercomputers
31Why We Need Supercomputers
xxxxxxxxxxxxxxxxxxxxxxx
32Software Development Issues
- Collaboration contexts/barriers
- Team work collaboration spaces
- Standards development DTDs
- Integration issues
- experimental data to homology to 3D model
- platform issues
- network issues 9k MTU - jumbo frames
- Licensing issues public vs. private
33Future Challenges
- Creating developer infrastructure for building up
structural models from component parts - components from macromolecule libraries ported to
object models - Understanding the design principles of systems of
macromolecules and harnessing them to create new
functions - specialized molecular machines
34Future Challenges
- Learning to design drugs efficiently and cost
effectively based on knowledge of the target - target generation automation
- validation automation
- Development of enhanced simulation models that
give insight into context based function from
knowledge of structure - possible use of artificial intelligence to limit
scope of search
35How Tools might be used for Industry Biotech
Projects
36Summary
- Bioinformatics
- well positioned to assist with application
development - exploring novel bioinformatics software
development - proceeding with supporting access GRID and
optical switching technology
37Questions/Comments ? -)