What%20is%20Bioinformatics? - PowerPoint PPT Presentation

About This Presentation
Title:

What%20is%20Bioinformatics?

Description:

What is Bioinformatics? The Data. The Analysis. Comparison. Evolution ... Peter Simmonds (Edinburgh) Bioinformatics Research Centre, Dk. Funding: MRC & EPSRC ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 31
Provided by: hein
Category:

less

Transcript and Presenter's Notes

Title: What%20is%20Bioinformatics?


1
What is Bioinformatics?
The Data The Analysis Comparison
Evolution Long
Distance Comparative Genomics
Short Distance Variation Analysis
Homology Non-homology Physical/Che
mical/Statistical Mathematical Modelling
2
The Data its growth.
1976/79 The first viral genome MS2/fX174 1995
The first prokaryotic genome H.
influenzae 1996 The first unicellular
eukaryotic genome - Yeast 1997 The first
multicellular eukaryotic genome C.elegans 2001
The human genome 3Gb
1.5.03 Known gt1000 viral genomes 96
prokaryotic genomes 16 Archeobacterial
genomes A series multicellular genomes are coming.
A general increase in data involving higher
structures and dynamics of biological systems
3
Genomes Tree of Life
  • 3.5-3.8 Gyr Origin of Life
  • 3 Gyr LUCA
  • 1.4 Gyr Origin of Eukaryotes
  • 5-600 Myr Origin of Vertebrates
  • 200 Myr Origin of Mammals
  • 80-100 Myr Mouse Mammalian Split
  • 5-7 Myr Chimp-Human Split
  • 100 Kyr Myr Age of Polymorphisms

From Janssen, 2003
4
Comparison of Evolutionary Objects.
Renin
HIV proteinase
General Theme. Formal Model of Structure
Stochastic Model of Structure Evolution.
Interaction Networks Any Graph.
Gene Structure
5
The Phylogeny for Evolutionary Objects
MRCA-Most Recent Common Ancestor
?
Time Direction
Parameterstime rates, selection
Unobservable Evolutionary Path
ATTGCGTATATAT.CAG
ATTGCGTATATAT.CAG
ATTGCGTATATAT.CAG
observable
observable
observable
3 Problems i. Test all possible
relationships. ii. Examine unknown internal
states. iii. Explore unknown paths between
states at nodes.
6
Gene and Genome Evolution
Chimp
Mouse
Higher Cells
E.coli
Fish
  • Basic Events
  • substitutions.
  • insertion deletions.
  • Chromosome Level events inversions,
    duplications, transpositions,..
  • Average Number of Mitoses
  • Per Male generation (1535 .. 20150)
  • Per Female generation 24
  • Single nucleotide substitutions 10-7
  • Microsatellites (100.000) 10-2
  • Small insertion deletions 10-8

7
Principles of String Comparison Alignment
ACTGT
ACTGT
ACTG-T ACTCCT
ACT-GT ACTCCT
ACTGCT
ACTCGT
.41
.41
ACTCCT
ACTCCT
Cost 2
Probability e-16.47
8
Maximum likelihood phylogeny and alignment
Gerton Lunter Istvan Miklos Alexei Drummond Yun
Song
Human alpha hemoglobinHuman beta
hemoglobin Human myoglobin Bean leghemoglobin
Probability of data
e-1560.138
Probability of data and alignment
e-1593.223 Probability
of alignment given data 4.279 10-15
e-33.085 Ratio of insertion-deletions to
substitutions 0.0334
9
Rooting using irreversibility (Lunter)
P( )
P( )
P( )
P( )
Reversibility

The Pulley Principle

Contagious Dependence
CG avoidance creates irreversibility
Lunter and Hein, ISMB2004
10
Comparison of Evolutionary Objects.
Observable
Unobservable
Goldman, Thorne Jones, 96
Knudsen Hein, 99 Eddy co.
Meyer and Durbin 02 Pedersen Hein, 03 Siepel
Haussler 03
Observable
Unobservable
11
The Rise of Comparative Genomics
Lander et al(2001) Figure 25A
12
Recursive Definition of Strings
Gene Grammar
RNA Grammar
S
I
ssS
S
sS
S
A
S
I
I
ssdSd
ssddSdd
S
S
A
A
S
I
I
E
E
ssddSdds
S
A
A
ATG
GAG
S
S -gt E I E -gt eE eI e I -gt iE
iI e
S -gt sS Ss dSd SS e
13
Stochastic Grammars
S--gt (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb
S -gt aSa -gt abSba -gt abaaba (.015) 0.3
0.5 0.1
If there is a 1-1 derivation (creation) of a
string, the probability of a string can be
obtained as the product probability of the
applied rules.
14
Grammars Finite Set of Rules for Generating
Strings
Regular
Context Free
Context Sensitive
General (also erasing)
finished no variables
15
Structure Dependent Evolution RNA
U A C A C C G U
U A C A C C G U
U A C A C C G U
From Bjarne Knudsen
U A C A C C G U
16
RNA Structure Application
From Knudsen Hein (1999)
Knudsen Hein, 2003
17
Observing Evolution has 2 parts
P(x)
x
x
P(Further history of x)
18
Inter- and Intra-species Comparisons
At shorter time scales
  • For sequences sampled within a population, their
    relationship is determined by population
    structure. There is no analogue for this for
    interspecies sequences.
  • Is within species variation a short time slice of
    long term variation?
  • Where do the species and population perspective
    meet?

19
Short Time Evolution Population Genetics and
History
Time
Ancestral Recombination Graph
1 2
1 2
1 2
1 2
1 2
Population
N
1
Three large areas of application
Interpretation of Variation Human
Population History Gene Mapping
Pathogen Evolution
Cardon Donnelly Griffiths McVean
Wiuf Song Schierup
20
Time slices
All positions have found a common ancestors on
one sequence
All positions have found a common ancestors
Time
1 2
1 2
1 2
1 2
1 2
N
1
Population
21
Applications to Human Genome (Chr 1) (Wiuf and
Hein,97)
4Ne 20.000 Segments 52.000 Ancestors
6.800
A randomly picked ancestor (ancestral material
comes in batteries!)
22
The Origin of Variation
C
G
T
C
G
Time
T
Show variation
N
1
Inter.SNP Consortium (2001) A map of human
genome sequence variation containing 1.42
million SNPs. Nature 409.928-33
23
Slice in Space
Time
N
1
24
Minimal ARGs and Haplotype Blocks (Song)
a (3,4) b (3,4) c (15,16) d (16,17) e
(35,36) f (35,36) g (36,37)
25
Yun Song, 2004
26
Genotype and Phenotype Covariation Gene Mapping
Time
Reich et al. (2001)
Rafnar et al.(2004) Morris et al(2001)
27
Finding Homologies
Database
New Sequence
/
P(
)
)
P(
)
P(
R. Doolittle et al.(1983). New Sequence
Simian Sarcoma Virus onc Gene Similar Sequence
Platelet-Derived Growth Factor
P28SIS 51 GGELESLARGSLGSLSVAEPAMIAECKTRTEVFEISAAL
IDATNANFLVWPPCVEVQACSGCCNNRN.. PDGF-1 1
----------SLGSLTIAEPAMIAECKTREEVCFCIAAL?DA????????
PPCVEVKACTGCCNNRN..


Properties for the known sequence are transferred
to the new sequence, immediately yielding
biological hypotheses about the new sequence.
28
Knowledge Based.. The Products of Evolution -
An Example (D.Baker)
Sequence
Structure
Make a List
Choose global structure that doesnt create new
local structures!
29
What is Bioinformatics?
The Data The Analysis Comparison
Evolution Long
Distance Comparative Genomics
Short Distance Variation Analysis
Homology Non-homology Physical/Che
mical/Statistical Mathematical Modelling
30
Jotun Hein
Alexei Drummond Roald Forsberg Bjarne
Knudsen Istvan Miklos Jakob Skou
Pedersen Santiago Schnell Carsten Wiuf .
Gerton Lunter Rune Lyngsoe Irmtraud Meyer Yun
Song Jennifer Taylor
Lizhong Hao Ben Holtom Stephen McCauley
  • Methodology
  • Evolutionary Models
  • Alignment
  • Expression Data
  • Genome and Gene Evolution
  • Sequence Variation Data Recombination
  • RNA Secondary Structure and Evolution
  • Collaborations
  • William Cookson (WCHG)
  • John Hancock (Harwell MRC)
  • Peter Simmonds (Edinburgh)
  • Bioinformatics Research Centre, Dk

Homepage http//www.stats.ox.ac.uk/mathgen/bio
informatics/
Write a Comment
User Comments (0)
About PowerShow.com