Title: Introduction to Computational Biology
1Introduction to Computational Biology
Instructor Yao-Ting HuangSlides are originated
from Serafim Batzoglouhttp//ai.stanford.edu/ser
afim/CS262_2006/
Bioinformatics Laboratory, Department of Computer
Science Information Engineering, National Chung
Cheng University.
2Language Requirement
- There are two official language in this class.
- One is English.
- The other is Latvian (?????).
- But we wish to have more two-way discussion than
one-way teaching. - It is OK to speak Chinese.
- Other people will translate into English.
3Reference Book
- An Introduction to Bioinformatics Algorithms, by
Neil C. Jones and Pavel A. Pevzner, MIT Press,
2004.
4Other Resources
- This brand-new field is lack of much choices of
textbook. - You have to use Google and Wikipedia from time to
time. - Bring your laptop if you need.
- I will try to release slide/material before class.
5Grading Policy
- One midterm exam (30)
- Final Project (30)
- Presentation of selected papers (25).
- Class participation (15)
6Prerequisites
- Basic concepts in algorithms and programming.
- Data structures
- Big-O notations
- O(n2)
- How about biology?
- It will be introduced in a problem-based manner.
7The Reason Why You Take This Course
8Clarification of Terms
- People are often confused with the following
terms. - Bioinformatics, computational biology, DNA
computing. - It has nothing to do with DNA computing used in
computational theory. - Computational Biology was the term used first.
- Bioinformatics was later created for naming
research issues in this field. - Bioinformatics issues mainly refer to biological
problems solved by computational approaches.
9Another Science Field Under Development
- Computational Biology (or Bioinformatics) was
expected to an interdisciplinary science. - It covers knowledge in biology, computer science,
and statistics. - Dr. Eddy speculated it is an ante-disciplinary
science. - Read his article if you have confusion on this
field.
10Double Helix
- Discovered by Watson and Crick, Nature, 1953.
- 900 words, 2 pages
AGCTGGCAT
11Human Genome
- In 2001, International Human Genome Project
released first draft of the sequence of the
human genome. - Note that this is only from one person.
12Human Genome
- 23 pairs of chromosomes
- 22 autosomal pairs
- the sex chromosome pair
- XX for females and XY for males.
- 3,000,000,000 bp
13Shotgun Sequencing
AGCATGCTGCAGTCATGCTTAGGCTA
14Genome Assembly
- The experimental results are a bunch of short DNA
fragments. - How to reconstruct the original sequence from
these short strings?
AGCATGCTGCAGTCATGCTTAGGCTA
15Genome Assembly
- The experimental results are a bunch of short DNA
fragments. - ATGGCCGA, GAGGTA, GTAACTGGT,
- The assembly problem can be formulated into
finding the shortest superstring. - e.g., G C C A T A G C C T A is a superstring of
- A T A G
- G C C A T
- A G C C T
- C C T A
16Fragment Assembly
Given N reads Where N 6 million We need to
use a linear-time algorithm
17History of Shotgun Sequencing
Lets sequence the human genome with the shotgun
strategy
That is impossible, and a bad idea anyway
Phil Green
Gene Myers
18Assembly Programs
- Phil Greens assembly program called PHRAP was
the popular tool used by most biologists. - Gene Myer proposed a new idea for accelerating
the sequencing and assembly but the reaction was
incredibly negative. - After several rejections of his paper, it was
finally accepted under the condition that a
rebuttal (by Phil Green) be published with it.
19Celera
- In 1998, one biotech company announced that they
would form a company called Celera to sequence
the genome using Myers shotgun approach. - The competition between the Human Genome Project
and Celera was launched. - In 1999, with a half-million lines of code,
Celera sequenced and assembled the first genome
of Drosophila (??). - In 2000, a first assembly of the human genome was
accomplished by Celera.
20Celera
- Gene Myers quote on the competition
- It was incredible time of my life. I thought I
could do it, but I didnt yet know I could do it.
The pressure was incredible, but so was the
excitement. - In 2001, the Human Genome Project consortium and
Celera together released the draft human genome.
21Big Shots in This Field
22Purpose of Analysis of DNA
- Why do we have to extract/study DNA sequences?
- Provide better medical treatment.
- Disease prevention and diagnosis.
- Personalized drug design.
23A Huge Amount of Information from DNA sequence
- Are all of these nucleotides functional?
- We often refer to functional part as genes.
AGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCGTA
AGAGCTATGCCATTTATATAAAATTTAAAGGCGCGGGGGGTTAAGAGGCG
CGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAAGC
TATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAA
AATTTAAAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTT
AAAGCGTAAGAGCTATGCCATTTATATACAATCTAAAGTTAAAGCGTAAG
AGCTATGCAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTA
AAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGAGGCGCGGGGGGT
TAAGAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCATT
TATATAAAATTTAAAGAAAGCGTAAGAGCTATGCCATTTATATAAAATTT
AAAGGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAGCTATGCCATTTATATAAAATTTAAAGGGCGCGGGGGGTTAAGAG
CTATGCCATTAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATT
TAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGTATAAAATTTA
AAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAGCTATGCCATTTATATAAAATTTAAAGAAAGCGTAAGAGCTATGC
CATTTATATAAAATTTAAAGGGCGCGGGGGGTTAAGAGCTATGCCATTTA
TATAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGGG
CGCGGGGGGTTAAGAGCTATGCCATTTAAAATTTAAAGGCGCGGGGGGTT
AAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCAT
TTATATAAAATTTAAAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATA
TAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGTTAA
AGCGTAAGAGCTATGCAGGCGCGGGGAGCTGGGTTTATATAAAATTTA
24Gene
- A gene is one portion of DNA that codes for a
protein. - 21,000 genes in human genome.
AGCCTAGTTGCAAA
DNA
RNA
Protein
25Genetic Variants
- The human genome obtained in 2001 is only for
reference (i.e., originated from one person). - But our genomes are differed from each other by
various types of genetic variations.
- The eyes phenotypes of green and black eye genes.
26Genetic Variants
- The genetic variants differ among members in the
human population.
Black eye Brown eye Black eye Blue eye Brown
eye Brown eye
GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T
GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA
T
DNASequences of 6 individuals
27(No Transcript)
28Association Study
- The DNA can be used for diagnosis.
- Association studys refer to the collection of DNA
sequences from cases and controls.
????
-A A T T T G C T C-
???
-A A T C T G C T C-
29Many other animals, will be sequenced aligned
30Other Sequencing Projects
- 2002 Mouse genome
- 2002 Rice genome
- 2004 Rat genome
- 2005 Chimpanzee genome
- Why do we need to know the DNA of other species?
31Reconstruction of Phylogenetic History
Can we reconstruct their phylogenetic history
based on their DNA sequences?
?
?
?
Macaque
Human
Chimpanzee
Mouse
32Orzs Evolution
- Given a sequence of orz-related words Orz, OTZ,
orz, crz, OTS, oO, Crz, or2. - Can you reconstruct the evolutionary history of
these words?
orz
orz
Crz
Orz
Crz
or2
Orz
33Reconstruction of Phylogenetic Tree
A tree with minimum total number of changes
(e.g., mutation or reversal).
orz
Orz
Crz
or2
OTZ
oO
STO
34Syllabus
- Classical and advanced alignment algorithms
- Space Saving Strategy
- BLAST and PatternHunter
- Genome assembly algorithms
- Evolutionary analysis
- Hidden Markov Models
- Other Selected Topics
35Happy Marriage?
- Bioinformatics is usually compared to a marriage
between computer science and biology. - But similar to all couples, after more than ten
years of the marriage,
36(No Transcript)
37Computer Scientists vs Biologists
- In biology, (almost) nothing is ever completely
true or false. - In computer science, everything is either true or
false.
38Computer Scientists vs Biologists
- Biologists strive to understand the very
complicated, very messy natural world. - Computer scientists seek to build their own clean
and organized virtual worlds.
39Computer Scientists vs Biologists
- Biologists are more data driven.
- Computer scientists are more algorithm driven.
- One consequence is CS www pages have fancier
graphics while Biology www pages have more
content.
40Computer Scientists vs Biologists
- Biologists are obsessed with being the first to
discover something. - Computer scientists are obsessed with being the
first to invent, improve, or prove something.
41Computer Scientists vs Biologists
- Biologists are quite comfortable with the idea
that all data has errors. - Computer scientists are not.
42Computer Scientists vs Biologists
- Computer scientists get high-paid jobs after
graduation. - Biologists typically have to complete one or more
post-docs...
43Deciphering DNA
- If you were ever curious about life, destiny, ,
you might consider studying DNA from now on.
44A Guest
- Prof. Ting will join us and show how to behave
like a good student and lecturer. - We have to find a new time slot to welcome him.