Outline - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Outline

Description:

Hard to study partially because structure is ... Multiple alignment: Like #1, only with multiple sequences. How to make this useful in context of evolution? ... – PowerPoint PPT presentation

Number of Views:45

Avg rating:3.0/5.0

Slides: 26

Provided by: danb195

Category:

more less

Transcript and Presenter's Notes

Title: Outline

1
Introduction
2
Outline

Topics for this class
Course logistics
A very little bit of background
Course topic overview
482/682 will be noticeably different from
previous years
The instructor has changed and they are
specialized in different areas in bioinformatics
?

3
COURSE LOGISTICS
4
Course staff

Instructor Bin Ma (DC 3345, binma_at_uwaterloo.ca,
http//www.cs.uwaterloo.ca/binma)
TA Xi Han
Course webpage monod.uwaterloo.ca/cs482
Prerequisites For undergraduates the two most
important prereqs are CS 341 and STAT 231.

5
Marking

For undergrads
4 assignments (40)
In-class midterm (20)
Final exam or final project (40)
The midterm will happen on 27 October.
For grad students
4 assignments (40)
1 final project, done by yourself (60)
A proposal (due Nov. 1)
A final report.
A presentation in class.
Undergraduates can do projects too. Will earn
40 marks.

6
Textbooks, notes

Textbook R. Durbin, S. Eddy, A. Krogh, G.
Mitchison, Biological sequence analysis
Probabilistic models of proteins and nucleic
acids, Cambridge University Press, 1999 , ISBN
0521629713.
This is a classic book in this area.
Another book that is useful, although not
required, is
Dan Gusfield, Algorithms on Strings, Trees and
Sequences Computer Science and Computational
Biology, Cambridge University Press, 1997, ISBN
0521585198.
Many other books are either too specialized or
low quality.
Much material lacks text support.
Notes
Notes serve as an outline of the material
lectured. Cannot replace the lecturing.
Notes will appear on the web soon after they are
presented in class, with corrections (!)

7
BRIEF REVIEW OF BIOLOGY
8
A brief review of biology

Modern molecular biology studies a few types of
biologically important molecules DNA, RNA,
protein, lipid, glycan
Bioinformatics has mostly studied DNA, then RNA
and protein, and less lipid and glycan.
The first three have their primary structures as
sequences.

9
DNA
3
5
G-C is stronger than A-T base pair.
5
3
10
DNA

Three reasons for DNAs popularity in
bioinformatics
The most important information carrying molecule
that passes information to children
responsible to many genetic diseases.
The simplest to model in a computer
DNA is modeled as a string over A,C,G,T
In bioinformatics sequence is more often used
than string. Why?
Data is the cheapest to obtain
It is predicted that a humans complete genome
(3Gbps) can be sequenced with lt1000 dollars in a
day in the near future.
Bioinformatics played a key role
Google donated a X-prize (http//www.xprize.org/).

11
RNA

RNA was less studied before but is now becoming
more and more important.
The structure is important to RNAs function.
Not a simple string anymore.

12
Protein
Primary structure is a sequence. 20 frequent
amino acids. Fold into a complex 3D structure.
13
Protein

Protein is the most important molecule for the
living of an organism
Structural components
Participate in almost all chemical reactions in
cells as enzymes (catalyst). Allow the organism
to react to the environment through sophisticated
signal pathway.
Directly responsible to most diseases (genetic or
not) and is the main drug target for diseases
including Alzheimer and cancer.
Protein has become extremely popular in
bioinformatics
Post-genome era
Genomics v.s. Proteomics
Hard to study partially because structure is
significant to the function
And its more expensive to get the data until
recently.

14
An example
HER2 is a proto-oncogene found on chromosome 17.
It encodes a protein and functions as a cell
membrane receptor.
Normal epithelial cells express low levels of
HER2 receptor on the cell surface. While some
types of breast cancer cells, over express this
gene. This signals the tumor cells to
proliferate (grow).
15
An example
16
Read more by yourself

If you did not have much biology background, read
the following articles (and other related
articles) from wikipedia
Protein, DNA, RNA, gene, genome, genetic code,
tRNA.
We will briefly review the necessary biology
knowledge when needed.

17
COURSE TOPICS

Keywords algorithm, sequence, phylogeny, protein
sequencing

18
Keyword 1 algorithm

This is a bioinformatics course focusing on
biological sequence analysis algorithms.
How bioinformatics is used in biology
Sample ? data ? software ? discovery
Bioinformatics research cycle
biological problem ? math model ? algorithm ?
software ? biology
Normally the data is too large or the model is
too complex so that efficient algorithm is
needed.
polynomial is no good any more.
some times even linear is not good enough.

19
An examlpe role of bioinformatics
mass spectrometry
protein sample
data

Interesting protein information includes
Protein identity
Protein quantity
PTM on proteins
These are useful for disease study and drug
development.

bioinformatics
protein information
20
Keyword 2 sequence

Fundamental information storage method in living
cells DNA sequences.
Central dogma of molecular biology DNA ? RNA ?
protein
Hence, to understand an organism, it helps to
start out by understanding DNA sequences.
We can treat DNA sequences as strings.
ACCGATTGAGCCGTACC
So were going to spend most of the course
learning about algorithms for strings and
sequences.

21
Keyword 3 phylogeny

Darwins theory of evolution told us that all
species share the same ancestor.
Knowing only the currently living species,
especially the DNA sequences, reconstruct this
tree.
Without digging the fossil

22
Keyword 4 protein sequencing

We will also talk about protein sequencing.
Proteins is the construction material and the
controls of a living organism. It determines
the phenotype (compared to genotype)
Consider genes as source codes and proteins as
running programs (processes).
We will study how to read the sequence
information of a protein from biological sample.
Very interesting algorithms.

They have the same genome!
23
Bioinformatics General Topics

This is not a general course in bioinformatics,
which has become a very broad area
Genome sequencing.
Sequence comparison
Gene prediction and annotation
Gene expression and biomarker
Motif finding
Regulatory network
Protein structure comparison and prediction (CS
483/683)
Protein-protein interaction
Protein id and quantification with mass
spectrometry
RNA structure, RNA gene prediction, RNAi.
Glycans and Lipids
Genetic variations SNPs, alternative splicing,
and diseases.
Phylogeny
Genome evolution
Medical/Cell image processing Molecular
simulation (bioinformatics? health
informatics?)
DNA computing. (a different area than
bioinformatics)

24
Specific topics

Pairwise alignment Which part of two sequences
are surprisingly similar to each other, if
theyve been evolving away from each other?
Phylogenetic reconstruction How do I build
evolutionary trees? How do I know theyre the
right ones?
Multiple alignment Like 1, only with multiple
sequences. How to make this useful in context of
evolution?
Gene finding Which part of a DNA sequence is
actually part of the process of producing
proteins?
Protein sequencing How to identify the protein
sequence from biological samples (wet lab ?
data)?