Aucun titre de diapositive - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Aucun titre de diapositive

Description:

The computer scientist's answer : Bioinformatics means RESEARCH in ... Proteins = muscles, hair, nails, ENZYMES. START. STOP. OPEN. READING. FRAME (ORF) ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 51

Provided by: jlr65

Category:

more less

Transcript and Presenter's Notes

Title: Aucun titre de diapositive

1
What is Bioinformatics ?
Answer There is
NO single answer
2
The biologists answer The computer is a TOOL
that helps me to make biological discoveries
? the computer scientist is a SERVICE PROVIDER
3
The computer scientists answer Bioinformatics
means RESEARCH in computing science, inspired
by biological problems
4
ATTGCTTTGATTGCTTTG... ...ATTGATTTGCAAAGCAAT...
Biologists question find the
repetitions within this sequence
5
ATTGCTTTGATTGCTTTG... ...ATTGATTTGCAAAGCAAT...
ATTGCTTT
G
ATTGCTTTG
A
TAACGAAAC
GTTTAGTT
6
DATA COMPRESSION
I am stupid and because I am stupid, I can't
even tell you that I am stupid
1 and because 1, I can't even tell you that 1
7
There exists thousands of algorithms to find
Common words in a text Longest common words
Approximate common words Approximate longest
common words ... But ... " Inverted repeats "
?
8
This prompted researches by mathematicians/ comput
er scientists sequence compression -gt less
disk space if a sequence can be compressed,
then there are repetitions finding new
algorithms (approximate matches, inverted
repeats) All this is transparent to the
biologist. The algorithm must find
repetitions, period.
9
WHY BIOINFORMATICS ?
Bioinformatics is not new population genetics
study of the distribution of and change in
allele frequencies (allele variant of a
gene) -gt modelling, simulation
Today a consequence of massive data throughtput
sequencing, microarrays ... and litterature
10
Sequencing a genome consists in determining the
precise ORDER of the nucleotides -or "bases"
(A, T, G, C) along the chromosome(s)
How many bases ? Is it difficult ? What's
inside a genome sequence ?
11
(No Transcript)
12
The molecule to be sequenced (e.g. a
chromosome) must be cut into small fragments (
1000 bases). Each fragment is sequenced.
13
(No Transcript)
14
A difficult problem in higher organisms
Repeated sequences
15
(No Transcript)
16
GENES are most important portions of the
chromosomes Genes that code for PROTEINS are
most important Proteins muscles, hair,
nails, ENZYMES
17
(No Transcript)
18
A simple way to find genes in bacteria
19
Open Reading Frames gt 300 bases in the bacterium
Rhizobium meliloti
Stop codons TAA, TAG and TGA
The genome of the bacterium is GC rich
? stop codons are rare
20
A long Open Reading Frame does not necessarily
mean a gene ...
21
Q How to choose between different ORFs ?
22
An example of STYLE codon usage Not all
codons are equal Most of the programs aiming
at finding genes use MARKOV MODELS
23
UUU F 0.57 UCU S 0.15 UAU Y 0.57 UGU C 0.45
UUC F 0.43 UCC S 0.15 UAC Y 0.43 UGC C 0.55
UUA L 0.13 UCA S 0.12 UAA 0.64 UGA 0.29
UUG L 0.13 UCG S 0.15 UAG 0.07 UGG W 1.00
CUU L 0.10 CCU P 0.16 CAU H 0.57 CGU R 0.38
CUC L 0.10 CCC P 0.12 CAC H 0.43 CGC R 0.40
CUA L 0.04 CCA P 0.19 CAA Q 0.35 CGA R 0.06
CUG L 0.50 CCG P 0.52 CAG Q 0.65 CGG R 0.10
AUU I 0.51 ACU T 0.17 AAU N 0.45 AGU S 0.15
AUC I 0.42 ACC T 0.44 AAC N 0.55 AGC S 0.28
AUA I 0.07 ACA T 0.13 AAA K 0.77 AGA R 0.04
AUG M 1.00 ACG T 0.27 AAG K 0.23 AGG R 0.02
GUU V 0.26 GCU A 0.16 GAU D 0.63 GGU G 0.34
GUC V 0.22 GCC A 0.27 GAC D 0.37 GGC G 0.40
GUA V 0.15 GCA A 0.21 GAA E 0.69 GGA G 0.11
GUG V 0.37 GCG A 0.36 GAG E 0.31 GGG G 0.15
24
EXON
INTRON
A
TC
25
GT
AG
26
238 gènes, 1254 introns
Distribution of nucleotides near the end of an
intron
27
Distribution of exon lengths in three species
28
Distribution of intron lengths in three species
29
About the splicing sites (intron/exon
junctions) The average number of introns in
human genes is 5 The gene coding for Titin
harbors 233 introns ! Suppose the introns are
predicted with 85 accuracy. Take a gene with
8 introns 0.858 0.27 Many exons are short
(lt50 bases) and separated by long introns
(gt1000 bases) There are mini-exons as short as
3 bases in A. thaliana ? unpredictable While
gt 80 of a bacterial chromosome consists of
genes, they account for less than 5 of the
human genome
30
DNA Chips
31
Gene (or part of gene) n 1
32
2

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
A A
B B
Problem how to identify which bacteria are
present in a given sample ?
38
How to discriminate between acute lymphoblastic
leukemia (ALL) and acute myeloid leukemia (AML)
? 38 patients, 27 ALL and 11 AML ? chip
comprising 6817 human genes ? 1100 genes seem
discriminating ? 50 genes kept after analysis
39
(No Transcript)
40
? Spot- tracking program
? Statistical analysis of the
measurements
41
Co-regulated genes ? Interaction
network(s) SYSTEMS BIOLOGY ? modeling of the
cell
42
Co-regulation of gene expression Enzymes that
are involved in the same metabolic pathway
43
(No Transcript)
44
Co-regulation of gene expression Enzymes that
are built up by different protein subunits
45
The Lactose Operon
46
The Lactose Operon
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
Modeling and Simulation of Genetic Regulatory
Systems
J. Comput. Biol. 9 (2002) 67-103
Directed and undirected graphs Bayesian
networks Boolean networks Generalized logical
networks Nonlinear ordinary differential
equations Piecewise-linear differential
equations Qualitative differential equations
Partial differential equations Stochastic
master equations Rule-based formalisms

Write a Comment

User Comments (0)