Sequence labeling and beam search

About This Presentation

Title:

Sequence labeling and beam search

Description:

Semi-supervised learning: yi is unknown for most instances. The ... Arc-emission HMM (Mealy machine): The probability depends on (from-state, to-state) pair. ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 35

Provided by: coursesWa5

Category:

more less

Transcript and Presenter's Notes

Title: Sequence labeling and beam search

1
Sequence labeling andbeam search

LING 572
Fei Xia
2/15/07

2
Outline

Classification problem (Recap)
Sequence labeling problem
HMM and Viterbi algorithm
Beam search
MaxEnt case study

3
Classification Problem
4
Classification problem

Setting
C a finite set of labels
Input x
Output y, where y 2 C.
Training data an instance list (xi, yi)
Supervised learning yi is known
Unsupervised learning yi is unknown
Semi-supervised learning yi is unknown for most
instances.

5
The 1st step data conversion

Represent x as something else.
Why?
The number of possible x is infinite.
The new representation makes the learning
possible.
How?
Represent x as a feature vector
Define feature templates what part of x is
useful for determining its y?
Calculate feature values

6
The 2nd step modeling

kNN and Rocchio find the closest neighbors /
prototypes
DT and DL find the matched group.

7
Modeling NB and MaxEnt

Given x, choose y, s.t.
y arg maxy P(yx) arg maxy P(x,y)
How to calculate P(x, y) ?
How many (x,y) unique pairs are there?
How can we make the task simpler?
Decomposition
Number of parameters 2k C ? O(k C)

8
The 3rd step training

kNN no training
Rocchio calculate prototypes
DT and DL learn the trees/rules by selecting
important features and splitting the data
NB calculate the parameter values by simply
counting
MaxEnt estimate parameters with iterations

9
The 4th step testing

kNN calculate distance between x and its
neighbors
Rocchio calculate distance between x and the
prototypes
DT and DL traverse the tree/list
NB and MaxEnt calculate P(x,y)

10
Attribute-value table

Each row corresponds to an instance
Each column except the last one corresponds to a
feature.
No features refer to the class label.
? At test time
the classification of xi does not affect the
classification of xj.
all the feature values are available before
testing starts.

11
Sequence labeling problem
12
Sequence labeling problem

Task to find the most probable labeling of a
sequence.
Examples
POS tagging
NP chunking
NE detection
Word segmentation
IGT detection
Parsing

13
Questions

Training data (xi, yi)
What is xi? What is yi?
What are the features?
How to convert xi to a feature vector for
training data? How to do that for test data?

14
How to solve a sequence labeling problem?

Using a sequence labeling algorithm e.g., HMM
Using a classification algorithm
Dont use features that refer to class labels
Use those features and get their values by
running other processes
Use those features and find a good (global)
solution.

15
Major steps

Data conversion
What is the label set?
Modeling
Training
Testing
How to combine individual labels to get a label
sequence?
How to find a good label sequence?

16
HMM and Viterbi algorithm
17
Two types of HMMs

State-emission HMM (Moore machine)
The emission probability depends only on the
state (from-state or to-state).
Arc-emission HMM (Mealy machine)
The probability depends on (from-state, to-state)
pair.

18
State-emission HMM

s1
s2
sN
w1
w4
w1
w5
w3
w1

Two kinds of parameters
Transition probability P(sj si)
Output (Emission) probability P(wk si)
? of Parameters O(NMN2)

19
Arc-emission HMM
w1
w2
w1
w1
w5
sN

s1
s2
w4
w3
Same kinds of parameters but the emission
probabilities depend on both states P(wk, sj
si) ? of Parameters O(N2MN2).
20
Constraints
For any integer n and any HMM
21
Properties of HMM

Limited horizon
Time invariance the probabilities do not change
over time
The states are hidden because we know the
structure of the machine (i.e., S and S), but we
dont know which state sequences generate a
particular output.

22
Three fundamental questions for HMMs

Finding the probability of an observation
Finding the best state sequence
Training estimating parameters

23
(2) Finding the best state sequence

Given the observation O1,To1oT, find the state
sequence X1,T1X1 XT1 that maximizes P(X1,T1
O1,T).
? Viterbi algorithm

24
Viterbi algorithm

The probability of the best path that produces
O1,t-1 while ending up in state si

Initialization
Induction
25
Important concepts

State vs. class label
Assumption P(ti t1i-1) P(ti ti-1)
Multiple sequences of states (paths) can lead to
a given state, but one is the most likely path to
that state, called the "survivor path".

26
Viterbi search
27
Beam Search
28
Beam search (basic)
29
More options

Expanding options TopN, minhyps
If hyps_num lt minhyps
then use max (topN, minhyps) tags for w_i
else use topN tags
Pruning options maxhyps, beam, minhyps
Keep a hyp iff
prob(hyp) beam gt max_prob
hyp is among top maxhyps, or
hyp is among the top minhyps

30
Beam search

Generate m tags for w1, set s1j accordingly
For i2 to n (n is the sentence length)
Expanding For each surviving sequence s(i-1),j
Generate m tags for wi, given s(i-1)j as previous
tag context
Append each tag to s(i-1)j to make a new
sequence.
Pruning
Return highest prob sequence sn1.

31
Viterbi vs. Beam search

DP vs. heuristic search
Global optimal vs. inexact
Small window vs. big window for features

32
Additional slides
33
(1) Finding the probability of the observation

Forward probability the probability of producing
O1,t-1 while ending up in state si

34
Calculating forward probability
Initialization
Induction

Write a Comment

User Comments (0)

About PowerShow.com

Sequence labeling and beam search - PowerPoint PPT Presentation

Sequence labeling and beam search

Semi-supervised learning: yi is unknown for most instances. The ... Arc-emission HMM (Mealy machine): The probability depends on (from-state, to-state) pair. ... – PowerPoint PPT presentation