Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
2Reading (all on reserve in library)
- Hagen, J. The origins of bioinformatics .
Nature Reviews. 1 231-236. - Ardeshir, Bayat. 2002. Bioinformatics
Science, medicine and the future. BMJ. 324
1018-1022.
3Bioinformatics
- Combination of computer science, mathematics,
physics, chemistry and biology - Used to answer fundamental questions in the life
sciences - What are the evolutionary origins of this
protein? - What gene does this DNA sequence code for?
- What does this gene do?
- How does this enzyme/ribozyme work and what does
it look like? - When is this gene expressed?
- What genes are expressed before the onset of
cancer? - What drugs can be used to treat this disease?
- What mutations are responsible for this genetic
disorder?
4Applications Human Insulin
Recombinant Insulin
Improvements
5Sequence and Structure Analysis
- Questions
- What are the evolutionary origins of this
protein? - What gene does this DNA sequence code for?
- What does this gene do?
- How does this enzyme/ribozyme work and what does
it look like? - Who was the responsible for publishing this
information and how can I find out more?
- Types of Analyses
- Determine phylogeny (MSA and construction of
phylogenetic trees) - Predict gene locations (ORF Finder or other gene
prediction software) - Predict gene product function (Blast or FastA
searches) - Predict protein/nucleic acid structure and
function (Protein Explorer) - Literature Searches (PubMed and Galileo)
6Sequence and Structure Analysis
- Questions
- When is this gene expressed?
- What genes are expressed before the onset of
cancer? - What drugs can be used to treat this disease?
- What mutations are responsible for this genetic
disorder?
- Types of Analyses
- Traditional Bioinformatics analyses
- Sequencing of genomes
- Microarray analysis
- Pharmacogenomics applications
7Significance to the World
- Locate mutations responsible for genetic
diseases. - Aids in the treatment and diagnosis of those
diseases - Pharmacogenomics
- Designer drugs and therapies
- Discover and exploit new enzymes
- Environmental clean-up
- Antibiotics and other chemotherapeutic agents
- Useful products
8http//www.d2ol.com/
Volunteer your computer to fight disease?
9Major Events in Molecular Biology History
- 1869 DNA discovered
- Johann Friedrich Mieschers nuclein
- 1941 Central Dogma revealed
- Beadle and Tatum
- 1950 Complementary Bases discovered
- Edwin Chargaff
- 1953 DNA is a double helix
- Watson, Crick and Franklin
- 1956Role of ribosomes
- George Emil Palade
10Major Events in the History of Molecular Biology
- 1950s
- The first protein sequenced
- Frederick Sanger
- Edman degradation
- Simplified Sangers method
- 1960s
- Ion exchange columns, chromatography and
electrophoresis - Sped up the process
- Pehr Edman
- Sequenatorautomated sequencing
- 1975 (Sanger)
- Dideoxy termination sequencing for DNA
11History of Bioinformatics
- From Computational Biology (early 60s) to
Bioinformatics (80s) - Discovery that macromolecules carry information
- Availability of computers that could crunch
numbers - Bioinformatics is information driven
- Large amounts of molecular data and computers to
analyze the data became available in the 60s - How can we use this data to address our many
biological questions? - Margaret Dayhoff (Bioinformatics Founder)
- Protein Information Resource database in 1980
- Algorithms to study protein sequences
- Tools to design and utilize sequence databases
12Dayhoffs Contributions
Dayhoff wrote FORTRAN programs to solve a
puzzle sequence assembly from weeks to minutes!
AVTALWGKVNVDEVG
VHLTPEEKS
AVTALWGKVNV
LVVYPWTQRF
GEALGRLLVVYP
PEEKSAVTA
KVNVDEVGGEALGR
These represent short segments of amino acid
sequences that make up hemoglobin.
13Dayhoffs Contributions
Dayhoff wrote FORTRAN programs to solve a
puzzle sequence assembly from weeks to minutes!
AVTALWGKVNVDEVG
VHLTPEEKS
LVVYPWTQRF
PEEKSAVTA
AVTALWGKVNV
GEALGRLLVVYP
KVNVDEVGGEALGR
14Dayhoffs Contributions
Dayhoff wrote FORTRAN programs to solve a
puzzle sequence assembly from weeks to minutes!
VHLTPEEKS
PEEKSAVTA
AVTALWGKVNV
AVTALWGKVNVDEVG
KVNVDEVGGEALGR
GEALGRLLVVYP
LVVYPWTQRF
15Dayhoffs Contributions
Dayhoff wrote FORTRAN programs to solve a
puzzle sequence assembly from weeks to minutes!
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRF
- Her programs were later added to automated
sequencers. - She also established the Atlas of Protein
Sequence (65) and Structure (a book), which later
became the PIR (80) - The PIR allowed sequence comparison, which lead
to molecular evolutionary biology (molecular
phylogeny)
16Macromolecular Information
- Three concepts gain much attention in late 60s
and 70s - The genetic code
- How do genes code for proteins
- DNA and RNA sequencing became popular in the
1970s - Protein structure
- How do proteins fold into functional products?
- Christian Anfinsen discovered that denatured
proteins often refold into their original
structure?Protein structure CAN be Predicted! - X-ray diffraction leads to X-ray crystallography
- Protein structures are predicted and more
information is gathered on their exact function
in a cell. - Protein Evolution
- Zuckerkandl and Pauling?semantides
- Information carrying sequences can be used to
measure change - Amino acid substitutions occur at a constant
rate?molecular clocks
17Development of Algorithms
- Phylogenetic Algorithm
- Complex mathematical formula used to determine
sequence homology - All possible ways a large number of sequences can
be compared to one another. - Fitch and Margoliash
- 1000 comparisonscomputer calculates the min.
number of mutations to convert one sequence to
another - Needleman and Wunsch
- Elaborated on the original
- Dayhoff and Eck
- Took each possible amino acid change during
protein evolution into account - PAM and MDM matrices
18Summary of Bioinformatics Applications to Learn
- What can you do with a sequence?
- Determine Gene function
- BLAST queries (Chapter 2)
- Determine protein sequence and predict protein
structure (Chapters 7 and 8) - Plan site directed mutagenesis experiments to
determine function - Bioinformatics techniques required Designing
Primers, mapping restriction sites, building
contiguous sequences from sequence products
(Chapter 1) - Locate Genes
- ORF finder and pattern searching (Chapter 6)
- Determine Gene Evolutionary Origins
- Multiple Sequence Alignments, phylogenetic trees
(Chapter 3, 4 and 5) - Learn something about an organisms gene
expression - Gene expression (Microarray data) analysis
(Chapter 6)
19Bioinformatics
- What can you tell from a sequence?
20Bioinformatics
- Genomics
- DNA Sequence
- Homology locations of genes and functional
sites phylogeny mapping - Infer function
- Transcriptomics
- mRNA sequence and structure
- Determine expression mechanisms via identifying
alternative splicing regions - Proteomics
- Amino acid Sequence and protein structure
- Predict structure
- Solve structure
- Infer function from structure