Sequential and Spatial Supervised Learning - PowerPoint PPT Presentation

1 / 1

About This Presentation

Title:

Sequential and Spatial Supervised Learning

Description:

However, many statistical learning problems involve sequential or spatial data ... Averaged perceptron ( Michael Collins et al. 2002 ) ... – PowerPoint PPT presentation

Number of Views:64

Avg rating:3.0/5.0

Slides: 2

Provided by: craigw4

Category:

more less

Transcript and Presenter's Notes

Title: Sequential and Spatial Supervised Learning

1
Sequential and Spatial Supervised Learning
Guohua Hao, Rongkun Shen, ,Dan Vega, Yaroslav
Bulatov and Thomas G. Dietterich School of
Electrical Engineering and Computer Science,
Oregon State University, Corvallis, Oregon 97331
Applications (contd)
Abstract
Methods
Applications (contd)
Traditional supervised learning assumes
independence between the training examples.
However, many statistical learning problems
involve sequential or spatial data that are not
independent. Furthermore, the sequential or
spatial relationships can be exploited to improve
the prediction accuracy of a classifier. We are
developing and testing new practical methods for
machine learning with sequential and spatial
data. This poster gives a snapshot of our
current methods and results.

Sliding window / Recurrent sliding window

Experiment result
Divide the training data into a sub-training set
and a development (validation) set. Try window
sizes from 1 to 21 and tree sizes from 10, 20,
30, 50, 70.
The best window size was 11, and the best tree
size was 20. With this configuration, the best
number of iterations to train was 110, which gave
66.3 correct predictions on the development set.
Train on the entire training set with this
configuration and evaluate on the test set. The
result was 67.1 correct.
Neural network sliding windows give better
performance than this, so we are currently
designing experiments to understand why!

A classifier is trained, and run on the 8
rotations and reflections of the test set.
A majority vote decides the final class.
A sliding window is used to group the input
pixels with varying square size, the same is done
for the output window. Thus, the labeling of a
pixel is dependent not only on the pixel
intensity values in its neighborhood, but also on
the labels placed on the pixels in the
neighborhood

Protein AA sequence gt1avhb-4-AS IPAYL AETLY
YAMKG AGTDD HTLIR VMVSR SEIDL FNIRK EFRKN FATSL
YSMIK GDTSG DYKKA LLLLC GEDD
yt-1
yt
yt1
Generate raw profile with PSI-BLAST
yt-1
yt
yt1
Majority voting
xt-1
xt
xt1
xt-1
xt
xt1
Classification result
Recurrent Sliding window
Sliding window
Pixel labels by NaiveBayes IC1 OC3 on each of
8 rotations and reflections of the test image

Hidden Markov Model Joint Distribution P(X,Y)

Experiment result
Different window sizes affect not only the
computation time, but also the accuracy of the
classifier
J48 (C4.5) and Naïve Bayes classifiers are the
most extensively studied. The results show that
Naïve Bayes achieves a higher accuracy with
smaller sliding windows, while J48 does better
with larger window sizes.

Feed into CRF

Generalization of Naïve Bayesian Networks
Transition probability P(ytyt-1)
Observation probability P(xtyt)
With conditional independence, impractical to
represent overlapping features of observations

yt-1
yt
yt1
CRF Training and Testing
Output
Prediction Results
Introduction
xt-1
xt
xt1
Figure 4

Semantic Role Labeling

In the classical supervised learning problems, we
assume that the training examples are drawn
independently and identically from some joint
distribution P(x,y). However, many applications
of machine learning involve predicting a sequence
of labels for a sequence of observations. New
learning methods are needed that can capture the
possible interdependencies between labels. We can
formulate this Sequential Supervised Learning
problem as follows Given a set of training
examples of the form (X,Y), where
X(x1,x2,,xn) a sequence of feature vectors
Y(y1,y2,,yn) corresponding label
sequences Goal find a classifier h to predict
new X as Yh(X)
A2 Accepted-from

The official "shared task" of the 2004 Co-NLL
conference.
For each verb in the sentence, find all of its
arguments and label their semantic roles.

Conditional Random Field Conditional
Distribution P(YX)

A1 thing accepted
( IC input context OC output context )
He would nt accept
anything of value from those he was writing
about
V verb

Extension of logistic regression to sequential
data
Label sequence Y forms a Markov random field
globally conditioned on observation X
Removes the HMM independence assumption

AM-MOD modal
yt-1
yt
yt1
A0 acceptor
AM-NEG negation

When compared to individual pixel classification
(59) it is easy to see that the recurrent
sliding window allows for a significant
improvement in the accuracy of the classifier.
Currently, the affect bagging and boosting has
on the accuracy is under investigation.

xt-1
xt
xt1

Difficulty for the machine learning
Humans use background knowledge to figure out
semantic roles
There are 70 various semantic role tags, which
makes it computationally intensive
Experiment result
Two forms of feature-inducing in Conditional
Random Fields
Regression tree approach
Incremental field growing
70 different semantic tags to learn. Training
set is 9000, whereas test set is 2,000
Evaluated using F-measure, the harmonic mean
between precision and recall of requested
argument types.
Both methods got similar performance, F-measure
around 65. The best published performance was
71.72, using a simple greedy left to right
sequence labeling
Again, simpler non-relational approaches
outperform CRF on this task. Why?

Potential function of the random field

Conclusions and Future work

Conditional Probability

In the recent years, substantial progress has
already been make to the sequential and spatial
supervised learning problems. This poster has
attempted to review some of the existing methods
and give out our current methods and experiment
results in several applications. Future work will
include
Develop methods that can handle large number of
classes
Discriminative methods using large margin
principles
Understand why structural learning methods, such
as CRF, do not outperform classical methods in
some structural learning problems

Maximize the log likelihood

yt-1
yt
yt1

Vertical relationship as in normal supervised
learning
Horizontal relationship interdependencies
between label variable, can improve accuracy

Parameter Estimation
Iterative scaling and gradient descent
exponential number of parameters
Gradient tree boosting only necessary
interactions among features

xt-1
xt
xt1

Classification of remotely sensed images

Acknowledgement
Figure 1

Discriminative methods Score function f(X,Y)

Examples include part of speech tagging, protein
secondary structure prediction etc. Extending
1-D observation and label sequences to 2-D
arrays, we obtain a similar formulation for the
Spatial Supervised Learning problem, where both X
and Y have 2-D structure and interdependencies
between labels.

Assign the crop identification classes
(unknown, sugar beets, stubble, bare soil,
potatoes, carrots) to pixels in the remotely
sensed image

We thank the National Science Foundation for
supporting this research under grant number
IIS-0307592

Averaged perceptron ( Michael Collins et al.
2002 )
Hidden Markov support vector machine (Yasemin
Altun et al. 2003 )
Maximum Margin Markov Network ( Ben Taskar 2003 )

Training set and test set are created by
dividing the image in half with a horizontal
line. The top half is used as the training set,
and the bottom half as the test set.
Training Set Expansion rotations and
reflections of the training set increase the
training set 8 fold.

Figure 5 Image with true labels of the classes.
Upper part is the training example and the lower
part is the testing example
Reference
Applications

Dietterich, T.G (2002). Machine learning for
sequential data a review. Structural,
Syntactic, and Statistical Pattern Recognition
(pp. 15-30). New York Springer Verlag
Lafferty, J., MaCallum, A., Pereira, F. (2001)
Conditional random fields Probabilistic models
for segmenting and labeling sequence data.
Proceedings of the 18th International Conference
on Machine Learning (pp.282-289). San Francisco,
CA Morgan Kaufmann
Dietterich T. G. , Ashenfelter, A., Bulatov,
Y. (2004) Training Conditional Random Field via
Gradient Tree Boosting. Proceedings of the 21st
International Conference on Machine Learning (pp
217-224) Banff, Canada
Jones D. T. (1999) Protein Secondary Structure
Prediction Based Matrices. J. Mol. Biol.
292195-202
Cuff J.A. and Barton G.J. (2000) Application of
Multiple Sequence Alignment Profiles to Improve
Protein Secondary Structure Prediction. Proteins
Structure, Function and Genetics 40502-511
Carreras, X. Marquez, L. Introduction to the
CoNLL-2004 Shared Task Semantic Role Labeling.
Proceedings of CoNLL-2004
Della Pietra, S., Della Pietra, V. Lafferty,
J. (1997) Inducing features of random fields.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 19(4), 380---393.

yi1,j
yi1,j1

Protein secondary structure prediction

Structural Supervised Learning Given A graph G
(V,E), each vertex is an (xv,yv) pair.
Some vertexes are missing the y
label Goal Predict the missing y labels

Assign the secondary structure classes (a-helix,
b-sheet and coil) to proteins
amino acid (AA) sequence, leading to tertiary
and/or quaternary structure
corresponding to protein functions
Use Position-Specific Scoring Matrix profiles to
improve the prediction accuracy
Use CB513 datasets, with sequences shorter than
30 AA residues excluded, in
our experiment