Magic Moments: Momentbased Approaches to Structured Output Prediction - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Magic Moments: Momentbased Approaches to Structured Output Prediction

Description:

Temporal, spatial and structural dependencies between objects ... PP ESTUDIA YA PROYECTO LEY TV REGIONAL REMITIDO POR LA JUNTA Merida. O N N N M m m N N N O L ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 49
Provided by: Eli456
Category:

less

Transcript and Presenter's Notes

Title: Magic Moments: Momentbased Approaches to Structured Output Prediction


1
Magic MomentsMoment-based Approaches
toStructured Output Prediction
The Analysis of Patterns
  • Elisa Ricci
  • joint work with Nobuhisa Ueda, Tijl De Bie, Nello
    Cristianini

Thursday, October 25th
2
Outline
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Learning in structured output spaces
  • New algorithms based on Z-score
  • Experimental results and computational issues
  • Conclusions

3
Structured data everywhere!!!
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Many problems involve highly structured data
    which can be represented by sequences, trees and
    graphs.
  • Temporal, spatial and structural dependencies
    between objects are modeled.
  • This phenomenon is observed in several fields
    such as computational biology, computer vision,
    natural language processing or web data analysis.

4
Learning with structured data
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Machine learning and data mining algorithms must
    be able to analyze efficiently and automatically
    a vast amount of complex and structured data.
  • The goal of structured learning algorithms is to
    predict complex structures, such as sequences,
    trees, or graphs.
  • Using traditional algorithms to cope with
    problems involving structured data often implies
    a loss of information about the structure.

5
Supervised learning
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Data are available in form of examples and their
    associated correct answers.

Training set
Hypotheses space
Find s.t.
Learning
Prediction
6
Classification
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • A typical supervised learning task is
    classification.

Named entity recognition (NER) locate named
entities in text. Entities of interest are person
names, location names, organization names,
miscellaneous (dates, times...)
x
Observed variable word in a sentence.
Multiclass classification
y
Label entity tag.
PP ESTUDIA YA PROYECTO LEY TV REGIONAL REMITIDO
POR LA JUNTA Merida. O N N
N M m m
N N N O L
7
Sequence labeling
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Can we consider the interactions between adjacent
    words?
  • Goal realize a joint labeling for all the words
    in the sentence.

Sequence labeling given an input sequence x,
reconstruct the associated label sequence y of
equal length.
x (x1...xn)
Observed sequence words in a sentence.
Label sequence entity tags.
y (y1...yn)
8
Sequence alignment
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Biological sequence alignment is used to
determine the similarity between biological
sequences.
ACTGATTACGTGAACTGGATCCA ACTC--TAGGTGAAGTG-ATCCA
?
Given two sequences S1, S2 ? S a global
alignment is an assignment of gaps, so as to line
up each letter in one sequence with either a gap
or a letter in the other sequence.
S A,T,G,C, S1 , S2 ? S
ATGCTTTC--- ---CTGTCGCC
S1 ATGCTTTC S2 CTGTCGCC
9
Sequence alignment
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence alignment given a sequences pair x,
predict the correct sequence y of alignment
operations (e.g. matches, mismatches,
gaps). Alignments can be represented as
paths from the upper-left to the lower-right
corner in the alignment graph.
10
RNA secondary structure prediction
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
RNA secondary structure prediction given a RNA
sequence, predict the most likely secondary
structure. The study of RNA structure
is important in understanding its functions.
AUGAGUAUAAGUUAAUGGUUAAAGUAAAUGUCUUCCACACAUUCCAUCUG
AUUUCGAUUCUCACUACUCAU
?
11
Sequence parsing
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence parsing given an input sequence x,
determine the associated parse tree y given an
underlying context-free grammar. Example
Context-free grammar GV, A, R, S V S set
of non-terminals symbols S G, A, U, C set of
terminals symbols R S ? SS GSC CSG ASU
USA e .
y
12
Generative models
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling
  • Traditionally HMMs have been used for sequence
    labeling.
  • Two main drawbacks
  • The conditional independence assumptions are
    often too restrictive. HMMs cannot represent
    multiple interacting features or long range
    dependencies between the observations.
  • They are typically trained by maximum likelihood
    (ML) estimation.

13
Discriminative models
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Specify the probability of possible output y
    given an observation x (consider conditional
    probability P(yx) rather than joint probability
    P(y,x)).
  • Do not require strict independence assumptions of
    generative models.
  • Arbitrary features of the observations are
    considered.
  • Conditional Random Fields (CRFs)
  • Lafferty et al., 01

14
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Several discriminative algorithms have emerged
    recently in order to predict complex structures,
    such as sequences, trees, or graphs.
  • New discriminative approaches.
  • Problems analyzed
  • Given a training set of correct pairs of
    sentences and their associated entity tags learn
    to extract entities from a new sentence.
  • Given a training set of correct biological
    alignments learn to align two unknown sequences.
  • Given a training set of corrects RNA secondary
    structures associated to a set of sequences learn
    to determine the secondary structure of a new
    sequence.
  • This is not an exhaustive list of possible
    applications.

15
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Multilabel supervised classification (Output y
    (y1...yn)).

Training set
Hypotheses space
Find s.t.
Learning
Prediction
16
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Three main phases
  • Encoding
  • define a suitable feature map f(x,y).
  • Compression
  • characterize the output space in a synthetic and
    compact way.
  • Optimization
  • define a suitable objective function and use it
    for learning.

17
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Encoding
  • define a suitable feature map f(x,y).
  • Compression
  • characterize the output space in a synthetic and
    compact way.
  • Optimization
  • define a suitable objective function and use it
    for learning.

18
Encoding
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
S1 ATGCTTTC S2 CTGTCGCC
  • Features must be defined in a way such that
    prediction can be computed efficiently.
  • The feature vector f(x,y) decomposes as sum of
    elementary features f(x,y) on parts.
  • Parts are typically edges or nodes in graphs.

19
Encoding
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling
Example CRF with HMM features
In general features reflect long range
interactions (when labeling xi past and future
observations are taken into account). Arbitrary
features of the observations are considered (e.g.
spelling properties in NER).
20
Encoding
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence alignment
  • 3-parameters model
  • In practice more complex models are used
  • 4-parameters model affine function for gap
    penalties, i.e. different costs if the gap starts
    (gap opening penalty) in a given position or if
    it continues (gap extension penalty).
  • 211/212-parameters model f(x,y) contains the
    statistics associated to the gap penalties and
    all the possible pairs of amino acids.

21
Encoding
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence parsing
The feature vector contains the statistics
associated to the occurrences of the rules.
y
22
Encoding
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Having defined these features, predictions can be
    computed efficiently with dynamic programming
    (DP).
  • Sequence labeling Viterbi algorithm
  • Sequence alignment Needleman-Wunsch
    algorithm
  • Sequence parsing Cocke-Younger-Kasami (CYK)
    algorithm

DP TABLE
23
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Encoding
  • define a suitable feature map f(x,y).
  • Compression
  • characterize the output space in a synthetic and
    compact way.
  • Optimization
  • define a suitable objective function and use it
    for learning.

24
Computing moments
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • The number N of possible output vector yk given
    an observation x is typically huge.
  • To characterize the distribution of the scores
    its mean and its variance are considered.
  • C and m can be computed efficiently with DP
    techniques.

25
Computing moments
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling
Recursive formula
The number N of possible label sequences yk given
an observation sequence x is exponential in the
length of the sequences. An algorithm similar to
the forward algorithm is used to compute m and C.

Mean value associated to the feature which
represents the emission of a symbol q at state p.
26
Computing moments
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Basic idea behind recursive formulas
  • Mean values are computed considering
  • Variances are computed centering the second order
    moments

27
Computing moments
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Problem high computational cost for large
    feature spaces.
  • 1st Solution Exploit the structure and the
    sparseness of the covariance matrix C.
  • In sequence labeling for CRF with HMM features
    the number of different values in C is linear in
    the size of the observation alphabet.
  • 2nd Solution Sampling strategy.

Example
28
Learning in structured output spaces
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Encoding
  • define a suitable feature map f(x,y).
  • Compression
  • characterize the output space in a synthetic and
    compact way.
  • Optimization
  • define a suitable objective function and use it
    for learning.

29
Z-score
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • New optimization criterion particularly suited
    for non-separable cases.
  • Minimize the number of output vectors with score
    higher than the score of the correct pairs.
  • Maximize the Z-score

30
Z-score
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • The Z-score can be expressed as a function of the
    parameters w.
  • Two equivalent optimization problems

31
Z-score
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Ranking loss
  • An upper bound on the ranking loss is minimized
  • The number of output vectors with score higher
    than the score of the correct pairs is minimized.

32
Previous approaches
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Minimize the number of incorrect macrolabels y.
  • CRFs Lafferty et al., 01, HMSVM Altun at al.,
    03, averaged perceptron Collins 02.
  • Minimize the number of incorrect microlabels y.
  • M3Ns Taskar et al., 03, SVMISO Tsochantaridis
    et al., 04.

33
SODA
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Given a training set T the empirical risk
    associated to the upper-bound on the ranking loss
    is minimized.
  • An equivalent formulation in terms of C and b is
    considered to solve it .

SODA (Structured Output Discriminant Analysis)
34
SODA
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Convex optimization
  • If C is not PSD, regularization can be
    introduced.
  • Solution simple matrix inversion .
  • Fast conjugate gradient methods available.

35
Rademacher bound
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • The bound shows that learning based on the upper
    bound on the ranking loss is effectively
    achieved.
  • The bound holds also in the case where b and C
    are estimated by sampling.
  • Two directions of sampling
  • For each only a limited number n
    of incorrect outputs is considered to estimate b
    and C.
  • Only a finite number l of input-output pairs is
    given in the training set.
  • The empirical expectation of the estimated loss
    (estimated by computing b
    and C by random sampling) is a good approximate
    upper bound for the expected loss
    .
  • The latter is an upper bound for the ranking loss
    , such that the Rademacher bound is
    also a bound on the expectation of the ranking
    loss.

36
Rademacher bound
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Theorem (Rademacher bound for SODA). With
    probability at least 1-d over the joint of the
  • random sample T and the random samples from the
    output space for each that are
  • taken to approximate the matrices b and C, the
    following bound holds for any w with squared
  • norm smaller than c
  • whereby M is a constant and we assume that the
    number of random samples for each training
  • pair is equal to n.
  • The Rademacher complexity terms and
    decrease with and respectively,
    such
  • that the bound becomes tight for increasing n and
    l, as long as n grows faster than log(l).

37
Z-score approach
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • How to define the Z-score of a training set?
  • Another possible approach (independence
    assumption)
  • Convex optimization problem which can be solved
    again by simple matrix inversion.
  • Maximizing the Z-score most linear constraints
  • are satisfied.

Z-score approach
38
Iterative approach
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • One may want to impose explicitly the violated
    constraints.
  • This is again a convex optimization problem that
    can be solved with an iterative algorithm similar
    to previous approaches (HMSVM Altun at al., 03,
    averaged perceptron Collins 02).
  • Eventually relax constraints (e.g. add slack
    variables for non separable problems).

39
Iterative approach
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
40
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling artificial data.
  • Chain CRF with HMM features.
  • Sequence length 50. Training set size 20
    pairs. Test set size 100 pairs.
  • Comparison with SVMISO Tsochantaridis et al.,
    04, Perceptron Collins 02, CRFs Lafferty et
    al., 01.
  • Average number of incorrect labels varying the
    level of noise p.

41
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling artificial data.
  • HMM features ( ).
  • Noise level p0.2.
  • Average number of incorrect labels and
    computational time as function of the training
    set size.

42
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling artificial data.
Chain CRF with HMM features (
). Sequence length 10. Training set size 50
pairs. Test set size 100 pairs. Level of
noise p0.2 Comparison with SVMISO
Tsochantaridis et al., 04. Labeling error on
test set and average training time as function of
the observation alphabet size.
43
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling artificial data.
  • Chain CRF with HMM features (
    ).
  • Adding constraints is not very useful when data
    are noisy and non linearly separable.

44
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence labeling
NER Spanish news wire article - Special
Session of CoNLL02 300 sentences with average
length of 30 words. 9 labels non-name, beginning
and continuation of persons, organizations,
locations and miscellaneous names. Two sets of
binary features S1 (HMM features) and S2 (S1 and
HMM features for the previous and the next word).
Labeling error on test set (5-fold
crossvalidation)
45
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Sequence alignment artificial sequences.
Test error (number of incorrectly aligned pairs)
as function of the training set size.
Original and reconstructed substitution matrices.

46
Experimental results
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • Sequence parsing
  • G6 grammar in Dowell and Eddy, 2004.
  • RNA sequences of five families extracted from the
    Rfam database Griffiths-Jones et al., 2003

Prediction on five-fold crossvalidation.
47
Conclusions
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
  • New methods for learning in structured output
    spaces.
  • Accuracy comparable with state-of-the-art
    techniques.
  • Easy to implement (DP for matrix computations and
    simple optimization problem).
  • Fast for large training set and reasonable number
    of features.
  • Mean and variance computations parallelizable for
    large training set.
  • Conjugate gradient techniques used in the
    optimization phase.
  • Three application analyzed sequence labeling,
    sequence parsing and sequence alignment.
  • Future works
  • Test the scalability of this approach using
    approximate techniques.
  • Develop a dual version with kernels.

48
Learning in structured output spaces Z-score Expe
rimental results and computational
issues Conclusions
Thank you
Write a Comment
User Comments (0)
About PowerShow.com