Title: Introduction to Bioinformatics and its Applications
1Introduction to Bioinformatics and its
Applications
- Mohamad Rabbath
- Msc. Inf. Freiburg University Software Engineer
in OFFIS
2Outline
- Introduction
- State of the Art in Bioinformatics
- Bioinformatics Applications
- Sequences Alignment Phylogenetic Trees
- RNA Algorithms
- Protein Structure Prediction Studies
- Summary
- References
- Demo (Optional), http//cpsp.informatik.uni-freibu
rg.de8080/index.jsp
3Introduction
Biological Data
Computing Power
4Intorduction
- Why do we need computers to analyze biological
data? - A very large amount of biological data needs fast
algorithms and computational resources - Exponential growth of biological data
5Intorduction
http//www.ncbi.nlm.nih.gov/Genbank/genbankstats.h
tml
6Intorduction
- Why do we need computers to analyze biological
data? - A Very large ammount of biological data needs
fast algorithms and computational resources - Exponential growth of biological data
- Growing gap between known sequences and known
structures (RNA and Proteins) - Structure prediction means lower cost
7Intorduction
- What skills are required for Bioinformatics?
- Deep knowledge in Database design and
implementation - Extensive ability of programming and designing
complex software systems and algorithms (
languages like c, perl, python, java, are
widely used in bioinformatics) - Strong ability of mathematical knowledge and
statistics - Some background in biology
8State of the Art in Bioinformatics
- Bioinformatics was the key element in completing
the Genome project in 2003 - Currently many researches study the RNA and
Protein structures - Sequencing human genome (23andme project is a
very good example)
9Bioinformatics Applications
- Drugs discovery and design
- Sequences Alignment
- RNA secondary structure prediction
- Proteins structures prediction
- Reduction of the cost of the Healthcare System
(Ex. Early discovery of genetic diseases)
10Sequences Alignment
- Arranging regions of similarity that may be a
consequence of functionality - Evolutionary relationships
- Very similar to Natural language processing
alignment problems
11Sequences Alignment
- A T G A A C C G C C T A A G C G G C A -- G
- A T G -- -- C C G A C T A -- A C G G A A G
- Three operations are defined
- Substitution
- Insertion
- Deletion
12Sequences Alignment Needleman-wunsch
- Three steps in dynamic programming
- Initialization
- Matrix fill (scoring)
- Traceback (alignment)
- Mi,j MINIMUM Mi-1, j-1 Si,j (substitution),
Mi,j-1 w (deletion), - Mi-1,j w (insertion)
13Sequences Alignment Needleman-wunsch
14RNA Secondary Structure Prediction
- RNA plays main rule in Biological Information Flow
15RNA Secondary Structure Prediction
- RNA GGGCGUGGGCGUAGUCGU
- RNA Structure
16RNA Secondary Structure Prediction
17RNA Secondary Structure PredictionNussinov
- Idea Maximizing the number of base pairs
18RNA Secondary Structure PredictionNussinov
G C A C G A
C G
G
C
A
C
G
A
C
G
19RNA Secondary Structure PredictionNussinov
G C A C G A
C G
G
C
A
C
G
A
C
G
G C A C G A C G
20RNA Secondary Structure Prediction Limitation of
Nussinov
- Base pair maximization does not yield
biologically relevant structures - Only one structure predicted
- Crossing structures can not be predicted
21RNA Secondary Structure Prediction Zucker
22Protein Structure Prediction
- The primary structure is a sequence of amino
acids - 3D Structure function
23Protein Structure Prediction HP
(Hydrophobic-Polar) Model
- The chemical group R (the side chain of the amino
acid) gives the unique properties - The hydrophobic amino acids tend to cluster
together - HP model restricts the 20 amino acids to two
classes
24Protein Structure Prediction3D Lattices
Representation
Cubic Lattice
FCC Lattice
25Protein Structure PredictionHow to achieve
prediction?
- Folding Simulation
- Hidden Markov Model and other stochastic models?
- Statistical model does not guarantee optimality
- Polynomial time required
- Constraint Programming Approach
- Optimality is guaranteed
- NP Completeness makes it time consuming
26Protein Structure PredictionWhy Why constraint
approach?
- The Solution is not unique (optimality is
targeted) - The Space of the problem is relatively small
- Offline problem
27Protein Structure Prediction Enhanced constraint
approach
28Protein Structure PredictionWhy Degenerecy and
protein-like sequences
Protein-like sequences in cubic lattice
Degeneracy
29Summary
- Bioinformatics is a promising branch resulted
from the marriage of computer science and biology - The exponatioal growth of biological data
requires machine analysis - Prediction problems are still open especially in
RNA Proteins - Designing fast algorithms, high ability of
programming skills and Databases design are the
main skills required to develope software in
bioinformatics - Both statistical and constraint approaches are
used to tackle problems in bioinformatics
30References
- Computational Molecular Biology An Introduction
Computational Molecular Biology An
Introduction, Peter Clote and Rolf Backofen - CPSP-web-tools a server for 3D lattice protein
studies (Martin Mann, Mohamad Rabbath ,Cameron
Smith, Marlien Edwards, Sebastian Will, Rolf
Backofen )