Introduction to Computational Biology - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Introduction to Computational Biology

Description:

There are two official language in this class. One is English. The other is ... Michael S. Waterman. 22. Dept. of Computer Science & Information Engineering ... – PowerPoint PPT presentation

Number of Views:1500
Avg rating:3.0/5.0
Slides: 45
Provided by: chi115
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Computational Biology


1
Introduction to Computational Biology
Instructor Yao-Ting HuangSlides are originated
from Serafim Batzoglouhttp//ai.stanford.edu/ser
afim/CS262_2006/
Bioinformatics Laboratory, Department of Computer
Science Information Engineering, National Chung
Cheng University.
2
Language Requirement
  • There are two official language in this class.
  • One is English.
  • The other is Latvian (?????).
  • But we wish to have more two-way discussion than
    one-way teaching.
  • It is OK to speak Chinese.
  • Other people will translate into English.

3
Reference Book
  • An Introduction to Bioinformatics Algorithms, by
    Neil C. Jones and Pavel A. Pevzner, MIT Press,
    2004.

4
Other Resources
  • This brand-new field is lack of much choices of
    textbook.
  • You have to use Google and Wikipedia from time to
    time.
  • Bring your laptop if you need.
  • I will try to release slide/material before class.

5
Grading Policy
  • One midterm exam (30)
  • Final Project (30)
  • Presentation of selected papers (25).
  • Class participation (15)

6
Prerequisites
  • Basic concepts in algorithms and programming.
  • Data structures
  • Big-O notations
  • O(n2)
  • How about biology?
  • It will be introduced in a problem-based manner.

7
The Reason Why You Take This Course
8
Clarification of Terms
  • People are often confused with the following
    terms.
  • Bioinformatics, computational biology, DNA
    computing.
  • It has nothing to do with DNA computing used in
    computational theory.
  • Computational Biology was the term used first.
  • Bioinformatics was later created for naming
    research issues in this field.
  • Bioinformatics issues mainly refer to biological
    problems solved by computational approaches.

9
Another Science Field Under Development
  • Computational Biology (or Bioinformatics) was
    expected to an interdisciplinary science.
  • It covers knowledge in biology, computer science,
    and statistics.
  • Dr. Eddy speculated it is an ante-disciplinary
    science.
  • Read his article if you have confusion on this
    field.

10
Double Helix
  • Discovered by Watson and Crick, Nature, 1953.
  • 900 words, 2 pages

AGCTGGCAT
11
Human Genome
  • In 2001, International Human Genome Project
    released first draft of the sequence of the
    human genome.
  • Note that this is only from one person.

12
Human Genome
  • 23 pairs of chromosomes
  • 22 autosomal pairs
  • the sex chromosome pair
  • XX for females and XY for males.
  • 3,000,000,000 bp

13
Shotgun Sequencing
AGCATGCTGCAGTCATGCTTAGGCTA
14
Genome Assembly
  • The experimental results are a bunch of short DNA
    fragments.
  • How to reconstruct the original sequence from
    these short strings?

AGCATGCTGCAGTCATGCTTAGGCTA
15
Genome Assembly
  • The experimental results are a bunch of short DNA
    fragments.
  • ATGGCCGA, GAGGTA, GTAACTGGT,
  • The assembly problem can be formulated into
    finding the shortest superstring.
  • e.g., G C C A T A G C C T A is a superstring of
  • A T A G
  • G C C A T
  • A G C C T
  • C C T A

16
Fragment Assembly
Given N reads Where N 6 million We need to
use a linear-time algorithm
17
History of Shotgun Sequencing
Lets sequence the human genome with the shotgun
strategy
That is impossible, and a bad idea anyway
Phil Green
Gene Myers
18
Assembly Programs
  • Phil Greens assembly program called PHRAP was
    the popular tool used by most biologists.
  • Gene Myer proposed a new idea for accelerating
    the sequencing and assembly but the reaction was
    incredibly negative.
  • After several rejections of his paper, it was
    finally accepted under the condition that a
    rebuttal (by Phil Green) be published with it.

19
Celera
  • In 1998, one biotech company announced that they
    would form a company called Celera to sequence
    the genome using Myers shotgun approach.
  • The competition between the Human Genome Project
    and Celera was launched.
  • In 1999, with a half-million lines of code,
    Celera sequenced and assembled the first genome
    of Drosophila (??).
  • In 2000, a first assembly of the human genome was
    accomplished by Celera.

20
Celera
  • Gene Myers quote on the competition
  • It was incredible time of my life. I thought I
    could do it, but I didnt yet know I could do it.
    The pressure was incredible, but so was the
    excitement.
  • In 2001, the Human Genome Project consortium and
    Celera together released the draft human genome.

21
Big Shots in This Field
22
Purpose of Analysis of DNA
  • Why do we have to extract/study DNA sequences?
  • Provide better medical treatment.
  • Disease prevention and diagnosis.
  • Personalized drug design.

23
A Huge Amount of Information from DNA sequence
  • Are all of these nucleotides functional?
  • We often refer to functional part as genes.

AGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCGTA
AGAGCTATGCCATTTATATAAAATTTAAAGGCGCGGGGGGTTAAGAGGCG
CGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAAGC
TATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAA
AATTTAAAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTT
AAAGCGTAAGAGCTATGCCATTTATATACAATCTAAAGTTAAAGCGTAAG
AGCTATGCAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTA
AAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGAGGCGCGGGGGGT
TAAGAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCATT
TATATAAAATTTAAAGAAAGCGTAAGAGCTATGCCATTTATATAAAATTT
AAAGGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAGCTATGCCATTTATATAAAATTTAAAGGGCGCGGGGGGTTAAGAG
CTATGCCATTAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATT
TAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGTATAAAATTTA
AAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAGCTATGCCATTTATATAAAATTTAAAGAAAGCGTAAGAGCTATGC
CATTTATATAAAATTTAAAGGGCGCGGGGGGTTAAGAGCTATGCCATTTA
TATAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGGG
CGCGGGGGGTTAAGAGCTATGCCATTTAAAATTTAAAGGCGCGGGGGGTT
AAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATATAAAATTTAAAGCG
TAAGAAGCTATGCCATTTATATAAAATTTAAAGCGTAAGAGCTATGCCAT
TTATATAAAATTTAAAGAGGCGCGGGGGGTTAAGAGCTATGCCATTTATA
TAAAATTTAAAGCGTAAGAGCTATGCCATTTATATAAAATTTAAAGTTAA
AGCGTAAGAGCTATGCAGGCGCGGGGAGCTGGGTTTATATAAAATTTA
24
Gene
  • A gene is one portion of DNA that codes for a
    protein.
  • 21,000 genes in human genome.

AGCCTAGTTGCAAA
DNA
RNA
Protein
25
Genetic Variants
  • The human genome obtained in 2001 is only for
    reference (i.e., originated from one person).
  • But our genomes are differed from each other by
    various types of genetic variations.
  • The eyes phenotypes of green and black eye genes.

26
Genetic Variants
  • The genetic variants differ among members in the
    human population.

Black eye Brown eye Black eye Blue eye Brown
eye Brown eye
GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGA-T
GATATTCGTACGGAAT GATGTTCGTACTGAAT GATGTTCGTACTGAA
T
DNASequences of 6 individuals
27
(No Transcript)
28
Association Study
  • The DNA can be used for diagnosis.
  • Association studys refer to the collection of DNA
    sequences from cases and controls.

????
-A A T T T G C T C-
???
-A A T C T G C T C-
29
Many other animals, will be sequenced aligned
30
Other Sequencing Projects
  • 2002 Mouse genome
  • 2002 Rice genome
  • 2004 Rat genome
  • 2005 Chimpanzee genome
  • Why do we need to know the DNA of other species?

31
Reconstruction of Phylogenetic History
Can we reconstruct their phylogenetic history
based on their DNA sequences?
?
?
?
Macaque
Human
Chimpanzee
Mouse
32
Orzs Evolution
  • Given a sequence of orz-related words Orz, OTZ,
    orz, crz, OTS, oO, Crz, or2.
  • Can you reconstruct the evolutionary history of
    these words?

orz
orz
Crz
Orz
Crz
or2
Orz
33
Reconstruction of Phylogenetic Tree
A tree with minimum total number of changes
(e.g., mutation or reversal).
orz
Orz
Crz
or2
OTZ
oO
STO
34
Syllabus
  • Classical and advanced alignment algorithms
  • Space Saving Strategy
  • BLAST and PatternHunter
  • Genome assembly algorithms
  • Evolutionary analysis
  • Hidden Markov Models
  • Other Selected Topics

35
Happy Marriage?
  • Bioinformatics is usually compared to a marriage
    between computer science and biology.
  • But similar to all couples, after more than ten
    years of the marriage,

36
(No Transcript)
37
Computer Scientists vs Biologists
  • In biology, (almost) nothing is ever completely
    true or false.
  • In computer science, everything is either true or
    false.

38
Computer Scientists vs Biologists
  • Biologists strive to understand the very
    complicated, very messy natural world.
  • Computer scientists seek to build their own clean
    and organized virtual worlds.

39
Computer Scientists vs Biologists
  • Biologists are more data driven.
  • Computer scientists are more algorithm driven.
  • One consequence is CS www pages have fancier
    graphics while Biology www pages have more
    content.

40
Computer Scientists vs Biologists
  • Biologists are obsessed with being the first to
    discover something.
  • Computer scientists are obsessed with being the
    first to invent, improve, or prove something.

41
Computer Scientists vs Biologists
  • Biologists are quite comfortable with the idea
    that all data has errors.
  • Computer scientists are not.

42
Computer Scientists vs Biologists
  • Computer scientists get high-paid jobs after
    graduation.
  • Biologists typically have to complete one or more
    post-docs...

43
Deciphering DNA
  • If you were ever curious about life, destiny, ,
    you might consider studying DNA from now on.

44
A Guest
  • Prof. Ting will join us and show how to behave
    like a good student and lecturer.
  • We have to find a new time slot to welcome him.
Write a Comment
User Comments (0)
About PowerShow.com