Activities in Combinatorial Pattern Matching - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Activities in Combinatorial Pattern Matching

Description:

Music retrieval and analysis. members: K ... for music comparison, retrieval and analysis; monophonic vs polyphonic music ... open: indexing of polyphonic music ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 30
Provided by: ukko9
Category:

less

Transcript and Presenter's Notes

Title: Activities in Combinatorial Pattern Matching


1
Activities in Combinatorial Pattern Matching
  • FDK SAB Meeting
  • March 22 2004

2
Algorithms on strings
  • members J Kärkkäinen, V Mäkinen, K Fredriksson
    (-12/2002), K Lemström, H Tamm, S Inenaga
    (9/2003-), S Burkhardt (2/2004-)
  • direct linear time construction of suffix arrays
    (J Kärkkäinen ICALP 03) elegant algorithm based
    on a novel approach textbook material
  • gapped q-gram filters for approximate string
    matching (Burkhardt/Kärkkäinen) generalized
    q-grams give better filters

3
Algorithms on strings (cont.)
  • string database query systems
  • efficient implementations of finite multitape
    automata
  • minimization of finite-state automata
  • THM A bideterministic finite-state automaton is
    minimal
  • PhD project of H Tamm

4
Transposition invariance
Transposition by -2
5
Algorithms on strings (cont.)
  • transposition invariant string matching
  • motivated by music information retrieval
  • A a1...am translated by t At a1t,
    ... ,amt
  • translation invariant distance dT(A,B) min
    t?S d(At,B)

6
Algorithms on strings (cont.)
  • exact case is simple
  • interval ai1 ai
  • intervals(At) intervals(A)
  • use interval sequences a2-a1, a3-a2, ,
    am-am-1 instead of originals in exact
    matching
  • transposition invariance for free

7
Algorithms on strings (cont.)
  • approximate case edit distance
  • repeat dynamic programming for all O(mn) relevant
    transpositions O(m2n2)
  • apply sparse dynamic programming O(mn log m)
  • V Mäkinens PhD Thesis (2003)

8
Matches cover all pairs in DP table
bj-ai
9

10
C C D C F E C C D C G F C C C1 A F E D B B A F G F
piano-roll representation
time
11
Algorithms on strings Piano-roll matching
  • geometric pattern matching under translations
    (Brass 2002)

12
(No Transcript)
13
Algorithms on strings Patterns with small tree
dimension
kT 2
-Type of a positive edge (a,b) b-a -Tree
dimension kT of P smallest number of edge types
in a positive spanning tree of P
14
Algorithms on strings Geometric generalization
of the Knuth-Morris-Pratt algorithm
  • THM Translated occurrences of P with
    tree-dimension kT can be found in O(2mmkTn)
    bitvector operations on m bits long vectors.
  • THM Finding kT is NP-complete can be
    approximated within logarithmic factor by the
    greedy set cover algorithm

15
Algorithms on strings plans
  • software library of string algorithms (J
    Kärkkäinen et al)
  • continue with basic research index structures,
  • inversion problems on sequences given a set of
    sequences find a model for them
  • combine combinatorial and probabilistic approach

16
Music retrieval and analysis
  • members K Lemström, V Mäkinen, A Pienimäki
  • efficient computational methods for music
    comparison, retrieval and analysis monophonic vs
    polyphonic music
  • content-based retrieval query-by-humming
  • geometric (piano-roll) sweepline algorithms
  • query engine prototype
  • open indexing of polyphonic music
  • open distance measures for music, computational
    characterization of musical styles

17
http//www.cs.helsinki.fi/group/cbrahms/demoengin
e/
18
Biological sequence analysis
  • members T Kivioja, K Palin, P Rastas, J Vilo
    (EBI)
  • bioinformatics component of a novel DNA
    expression measurement technique TRAC developed
    at VTTBiotechnology selection and pooling of
    hybridization probes for entire genomes
  • computational methods for optimizing cDNA-AFLP
    experiments
  • T Kiviojas PhD Thesis (expected in 2004)

19
Biological sequence analysis (cont)
  • SPEXS tool for finding regulatory patterns
    (common motifs) from a set of sequences, with
    several applications J Vilos PhD Thesis (2002)
  • general method for finding correlations between
    observed and predicted effects of gene knockout
    experiments (ECCB2002) gene regulatory networks
  • the effect of SNPs on the binding affinities of
    transcription factors comparative genomics
    approach - clustering of binding site motifs over
    multiple genomes (unpublished)

20
Biological sequence analysis (cont)
  • algorithms for finding haplotype blocks and
    haplotype mosaics
  • inversion of recombinations
  • new Hidden Markov Model (unpublished)
  • Minimum Description Length Principle EM
    algorithm
  • c.f. group Mannila / group Toivonen

21
Haplotype data
22
Hidden Markov Model
Cross-over probabilities
23
Metabolic modeling
  • members J Rousu (postdoc in London), A Rantanen,
    E Pitkänen
  • data from isotopic-tracing experiments
  • new closed-form method for estimation of
    metabolic fluxes in steady state
  • underdetermined linear systems
  • propagation of information through the metabolic
    network, guided by the carbon maps of individual
    reactions
  • prototype software system

24
Metabolic modeling plans
  • www server based on current prototype software
    (Neobio/TEKES)
  • utilizing gene expression data
  • what if analysis hypothetical reactions
  • better algorithms bi-directional reactions, cell
    compartments, higher-order balance equations,
    improved propagation of information, incremental
    versions

25
Computational structural biology
  • members J Ravantti, K Fredriksson (-12/2002), T
    Mielikäinen, T Ojamies

26
Computational struct. biology (cont)
  • constrain-based algorithm for tomographic
    reconstruction
  • sinogram-based (sound) method for noise reduction
  • finding consistent orientations is NP-hard and
    inapproximable
  • J Ravanttis PhD project

27
Computational struct. biology (cont)
  • BLAST for 3D density models (arrays of voxels)
    lots of applications
  • model comparison substructure search shared
    substructures (all-against-all)
  • translation-rotation invariance 6D search
  • differences in distance and density scales
  • solution under development contour extraction
    geometric hashing

28
Computational struct. biology (cont)
Contour extraction
29
Computational struct. biology (cont)
Original model
Model assembled from substructures
Write a Comment
User Comments (0)
About PowerShow.com