Title: Quick Overview of Bioinformatics
1Quick Overview of Bioinformatics
- Chuong Huynh
- NIH/NLM/NCBI
- Bangkok, Thailand
- July 9, 2002
- huynh_at_ncbi.nlm.nih.gov
2What is bioinformatics? - Definition
- My definition bringing biological themes to
computers - BISTIC Bioinformatics Definition Research,
development, or application of computational
tools and approaches for expanding the use of
biological, medical, behavioral or health data,
including those to acquire, store, organize,
archive, analyze, or visualize such data - BISTIC Computational Biology Definition --
Computational Biology the development and
application of data-analytical and theoretical
methods, mathematical modeling and computational
simulation techniques to the study of biological,
behavioral, and social systems. - http//grants2.nih.gov/grants/bistic/bistic.cfm
3Useful/Necessary Bioinformatics Skills
- Strong background in some aspect of molecular
biology!!! - Ability to communicate biological questions
comprehensibly to computer scientists - Thorough comprehension of the problem in the
bioinformatics field - Statistics (association studies, clustering,
sampling) - Ability to filter, parse, and munge data and
determine the relationships between the data sets - Mathematics (e.g. algorithm development)
- Engineering (e.g. robotics)
- Good knowledge of a few molecular biology
software packages (molecular modeling / sequence
analysis) - Command line computing environment (Linux/Unix
knowledge) - Data administration (esp. relational database
concept) and Computer Programming
Skills/Experience (C/C, Sybase, Java, Oracle)
and Scripting Language Knowledge (Perl and
perhaps Phython)
4Bioinformatics Flow Chart (0)
1a. Sequencing
6. Gene Protein expression data
1b. Analysis of nucleic acid seq.
7. Drug screening
2. Analysis of protein seq.
Ab initio drug design OR Drug compound screening
in database of molecules
3. Molecular structure prediction
4. molecular interaction
8. Genetic variability
5. Metabolic and regulatory networks
5Bioinformatics Flow Chart (1)
1a. Sequencing
- Base calling
- Physical mapping
- Fragment assembly
1b. Analysis of nucleic acid seq.
- -gene finding
- Multiple seq alignment
- ? evolutionary tree
Stretch of DNA coding for protein Analysis of
noncoding region of genome
2. Analysis of protein seq.
Sequence relationship
3. Molecular structure prediction
3D modeling DNA, RNA, protein, lipid/carbohydrate
Protein-protein interaction Protein-ligand
interaction
4. molecular interaction
5. Metabolic and regulatory networks
6Bioinformatics Flow Chart (2)
6. Gene Protein expression data
7. Drug screening
- Lead compound binds tightly to binding site of
target protein - Lead optimization lead compound modified to be
nontoxic, - few side effects, target deliverable
Ab initio drug design OR Drug compound screening
in database of molecules
Drug molecules designed to be complementary to
binding Sites with physiochemical and steric
restrictions.
- Now investigated at the genome scale
- SNP, SAGE
8. Genetic variability
7Sequencing
Genomic DNA
Shearing/Sonication
Subclone and Sequence
Shotgun reads
Assembly
Contigs
Finishing read
Finishing
Complete sequence
8Annotation of eukaryotic genomes
Genomic DNA
ab initio gene prediction
transcription
Unprocessed RNA
RNA processing
Mature mRNA
AAAAAAA
Gm3
Comparative gene prediction
translation
Nascent polypeptide
folding
Active enzyme
Functional identification
Reactant A
Product B
Function
9Annotation
- Predict protein
- Extract ORFs
- Remove errors
- Compare with database of known function
proteins - Provide transitive annotations
10The new information is always partial
- Complete Eukaryotic Genomes
- Ongoing Eukaryotic
- Prokaryotic Ongoing
- Published
- Even a complete genome is only partially
understood
11Why not use the genome sequence once its ready?
- Finding exons
- 30 overprediction
- 20 not found at all
- Comparison systems rely on EST sequences which
themselves contain large error rates - Others are looking through partial data
- Once the genome is done when?
- Expressed sequences are there in part and
represent a very very powerful key.
12Interpreting data from many sources
13Genomics and Tropical Diseases
- How Can Genomics Contribute tothe Control of
Tropical Diseases? - Challenges and Opportunities
- The Role of Bioinformatics
14Why Pathogen Genomics?
- The power and cost-effectiveness of modern
genome sequencing technology mean that complete
genome sequences of 25 of the major bacterial and
parasitic pathogens could be available within
five years. For about 100 million dollars (), we
could buy the sequence of every virulence
determinant, every protein antigen and every drug
target.
B. Bloom (1995) A microbial minimalist. Nature
378236
15Genomics and Drug Development for Tropical
Diseases Challenges
- Knowledge limitations
- A large proportion of pathogen genes have unknown
function - Heavy investment in genomics is done by the
commercial sector and therefore not widely
available - Emphasis and priorities
- Genomes of non-pathogenic model organisms (S.
cerevisiae, D. melanogaster, C. elegans, A.
thaliana) - Genomes of pathogens that affect individuals in
developed countries - Neglected diseases ? neglected pathogens
16Doing Successful Science in the new millennium
- Huge increase in available biological information
- Classic paradigm of molecular biology now is
altering rapidly to genomics - Understanding of the new paradigms concerns more
than just bench biology - Discovery requires large scale systems and broad
collaborations, Global problems - Funding comes in large amounts at group level, no
longer a single laboratory or institution effort. - Accountable output
17The Bigger Picture (Malaria)
18Genomics Approach to Drug Development
Opportunities
- Classical laboratory assays aim at targets in
which mutation is lethal to the pathogen - Valuable targets can be missed
- Sulphonamides Inhibition of the p-aminobenzoic
acid pathway not lethal for growth in laboratory
but severely attenuate the capacity to cause
disease
19Genomics Approach to Drug Development
Opportunities
- New approaches for the identification of gene
products specifically involved in the disease
process may uncover further drug targets - Signature tagged mutagenesis (STM)
- Transposon site hybridization (TraSH)
- Pathogen genomics and data mining for the
discovery of new drug targets
20Fosmidomycin
- September 1999 a basic science breakthrough
(data mining through bioinformatics identify new
targets for chemotherapy of malaria) - 1st semester 2001 Results of Phase I clinical
trials
21Fosmidomycin example - lesson
- A lesson to take home 1½ years from data mining
and laboratory research to phase II,
proof-of-principle clinical trials
22Bioinformatics Opportunities in Health Research
and Development
- New drug research and development
- Identification of novel drug/vaccine targets
- Structural predictions
- Tapping into biodiversity
- Reconstruction of metabolic pathways
- Systems biology
- Identification of vaccine candidates through
analysis of surface antigens and epitopes
23A Window of Opportunity for Disease Endemic
Countries
- Bioinformatics is an extremely important tool,
with relevance to studying pathogenic organisms - Pathogens of interest to DECs already being
sequenced (e.g. P. falciparum, T. cruzi, T.
brucei, Leishmania sp.) - Computational biology is people-intensive, less
affected by infrastructure, economics, etc than
other areas of biological research - Critical mass issues less critical a
world-wide community is within reach
24Relatively Modest Hardware Needs and Technical
Support
- Linux operating system permits use of the
personal computer as a powerful workstation - Vast repository of public domain software for
computational biology - Individual accounts for remote access and data
processing can be open at high-performance
computer facilities and regional centers - EMB network nodes, FIOCRUZ (Brazil), SANBI (South
Africa), CECALCULA (Venezuela), ICGEB (Trieste
and New Delhi)
25Relatively Modest Hardware Needs and Technical
Support
- Powerful searches using public websites
- NCBI, EMB nodes, Sanger Center, Expasy/SwissProt,
KEGG database - High-speed internet access is becoming more and
more available in disease endemic countries
through regional and international support, e.g. - Asia-Pacific Advanced Network Consortium (APAN)
http//www.th.apan.net/ - MIMCom Malaria Research Resources
http//www.nlm.nih.gov/mimcom/about.html
26TDR Regional Training Centers Courses in
Bioinformatics
International Training Course on Bioinformatics
and Computational Biology Applied to Genome
Studies(Train-the-trainers Workshop)May 21-June
15, 2001 FIOCRUZ, Brazil
- Africa
- SANBI, Cape Town, South Africa
- Course 20/Jan-02/Feb/2002
- South America
- USP, São Paulo, Brazil
- Course 18/Feb-02/Mar/2002
- Southeast Asia
- ICGEB, New Delhi, India
- Course 26/April-09/May/2002
- Mahidol University, Bangkok, Thailand
- Course 09-23/Jul/2002
27Beginning Bioinformatics Books
- Baxevanis Ouellette 2001. Bioinformatics A
Practical Guide to the Analysis of Genes and
Proteins 2nd Edition. John Wiley Publishing. - Gibas Jambeck 2001. Developing Bioinformatics
Computer Skills. OReilly. - Mount 2001. Bioinformatics
28Course Schedule
Take out your course schedule.
29The Challenge
What is expected of you?
30Extra Slides
31Genome Sequencing - Review
Strategy
Strategy
Libraries
Libraries
Sequencing
Sequencing
Assembly
Assembly
Closure
Closure
Annotation
Annotation
Release
Release
32Positional Cloning
33Positional Candidate Cloning
34Fosmidomycin
Results
FCT Fever clearance time PCT Parasite
clearance time
35Fosmidomycin
- Objective To determine proof of concept by
evaluating the efficacy of fosmidomycin in
uncomplicated P. falciparum malaria - Study sites Africa (Gabon), Asia (Thailand)
- Patients Adult uncomplicated P. falciparum
malaria - Regimen 1200 mg q 8 h for 7 days
- Primary endpoint Cure rate at day 7
- Secondary endpoint Cure rate at day 28, fever
clearance time, parasite clearance time
36Fosmidomycin / Next Steps
- Fosmidomycin has intrinsic antimalarial activity
- i.e. proof of concept established - 2nd antimalarial drug with short half-life
- Potential use in drug combinations
- Not good enough to use on its own
- Do more chemistry to improve PK
- A lesson to take home 1½ years from data mining
and laboratory research to phase II,
proof-of-principle clinical trials