Title: Algorithms FDK Center of Excellence
1Algorithms FDK Center of Excellence
Department of Computer Science
2Mission
- Sequences Are Everywhere
- combinatorial pattern matching
- pattern discovery in sequences
- dynamic programming, automata theory, advanced
data structures, probabilistic modeling - algorithms on strings and biological sequence
analysis studied since 1980 many our results
appear in textbooks
3J Kärkkäinen, P Sanders S Burkhardt Linear
work suffix array construction. J ACM 53 (2006),
918-936
- direct construction of a suffix array in linear
time - immediately included in teaching materials
internationally
abaab baab aab ab b
SuffixTree(abaab)
a
baab
sort
ab
baab
aab ab abaab b baab
4V Mäkinen, G Navarro E Ukkonen Transposition
invariant string matching. J Algorithms 56
(2005)E Ukkonen, K Lemström V Mäkinen
Sweepline the music! LNCS 2598 (2003), 330-342.
- Transposition invariant variants of string
matching algorithms - Music retrieval
Transposition by -2
5P Rastas, M Koivisto, H Mannila E Ukkonen A
hidden Markov technique for haplotype
reconstruction. WABI 2005, 140-151
founder
SNP
6O Hallikas al Genome-wide prediction of
mammalian enhancers based on analysis of
transcription-factor binding affinity. Cell 124
(2006), 47-59.
enhancer module
gene1
gene2
gene3
gene4
DNA
transcription
transcription factors
RNA
translation
Proteins
7Computational identification of enhancer elements
- Preserved in evolution
- Affinities of functional cis-elements.
- Spatial arrangement of elements within a module.
-
Human
Mouse
8Enhancer prediction for N-myc (Cell 2006, Nat.
Protocols 2006)
200 kb Mouse N-Myc genomic region
200 kb Human N-Myc genomic region
Conserved GLI binding sites in two predicted
enhancer elements, CM5 and CM7
9A Rantanen Algorithms for 13C Metabolic Flux
Analysis. PhD Thesis 2006.
10Future goals
- Indexing sequential data for approximate searches
- Distance functions for sequences
- Theoretical framework, efficient evaluation,
complexity bounds, relations between distances - Application specific distances (XML, images,
music, ) - Finding structure and signals in sequences
- Supervised and unsupervised learning of signals
- Statistical significance of the findings
- Systems biology How works the program encoded in
genomes?