Sequence labeling and beam search - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Sequence labeling and beam search

Description:

Semi-supervised learning: yi is unknown for most instances. The ... Arc-emission HMM (Mealy machine): The probability depends on (from-state, to-state) pair. ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 35
Provided by: coursesWa5
Category:

less

Transcript and Presenter's Notes

Title: Sequence labeling and beam search


1
Sequence labeling andbeam search
  • LING 572
  • Fei Xia
  • 2/15/07

2
Outline
  • Classification problem (Recap)
  • Sequence labeling problem
  • HMM and Viterbi algorithm
  • Beam search
  • MaxEnt case study

3
Classification Problem
4
Classification problem
  • Setting
  • C a finite set of labels
  • Input x
  • Output y, where y 2 C.
  • Training data an instance list (xi, yi)
  • Supervised learning yi is known
  • Unsupervised learning yi is unknown
  • Semi-supervised learning yi is unknown for most
    instances.

5
The 1st step data conversion
  • Represent x as something else.
  • Why?
  • The number of possible x is infinite.
  • The new representation makes the learning
    possible.
  • How?
  • Represent x as a feature vector
  • Define feature templates what part of x is
    useful for determining its y?
  • Calculate feature values

6
The 2nd step modeling
  • kNN and Rocchio find the closest neighbors /
    prototypes
  • DT and DL find the matched group.

7
Modeling NB and MaxEnt
  • Given x, choose y, s.t.
  • y arg maxy P(yx) arg maxy P(x,y)
  • How to calculate P(x, y) ?
  • How many (x,y) unique pairs are there?
  • How can we make the task simpler?
  • Decomposition
  • Number of parameters 2k C ? O(k C)

8
The 3rd step training
  • kNN no training
  • Rocchio calculate prototypes
  • DT and DL learn the trees/rules by selecting
    important features and splitting the data
  • NB calculate the parameter values by simply
    counting
  • MaxEnt estimate parameters with iterations

9
The 4th step testing
  • kNN calculate distance between x and its
    neighbors
  • Rocchio calculate distance between x and the
    prototypes
  • DT and DL traverse the tree/list
  • NB and MaxEnt calculate P(x,y)

10
Attribute-value table
  • Each row corresponds to an instance
  • Each column except the last one corresponds to a
    feature.
  • No features refer to the class label.
  • ? At test time
  • the classification of xi does not affect the
    classification of xj.
  • all the feature values are available before
    testing starts.

11
Sequence labeling problem
12
Sequence labeling problem
  • Task to find the most probable labeling of a
    sequence.
  • Examples
  • POS tagging
  • NP chunking
  • NE detection
  • Word segmentation
  • IGT detection
  • Parsing

13
Questions
  • Training data (xi, yi)
  • What is xi? What is yi?
  • What are the features?
  • How to convert xi to a feature vector for
    training data? How to do that for test data?

14
How to solve a sequence labeling problem?
  • Using a sequence labeling algorithm e.g., HMM
  • Using a classification algorithm
  • Dont use features that refer to class labels
  • Use those features and get their values by
    running other processes
  • Use those features and find a good (global)
    solution.

15
Major steps
  • Data conversion
  • What is the label set?
  • Modeling
  • Training
  • Testing
  • How to combine individual labels to get a label
    sequence?
  • How to find a good label sequence?

16
HMM and Viterbi algorithm
17
Two types of HMMs
  • State-emission HMM (Moore machine)
  • The emission probability depends only on the
    state (from-state or to-state).
  • Arc-emission HMM (Mealy machine)
  • The probability depends on (from-state, to-state)
    pair.

18
State-emission HMM

s1
s2
sN
w1
w4
w1
w5
w3
w1
  • Two kinds of parameters
  • Transition probability P(sj si)
  • Output (Emission) probability P(wk si)
  • ? of Parameters O(NMN2)

19
Arc-emission HMM
w1
w2
w1
w1
w5
sN

s1
s2
w4
w3
Same kinds of parameters but the emission
probabilities depend on both states P(wk, sj
si) ? of Parameters O(N2MN2).
20
Constraints
For any integer n and any HMM
21
Properties of HMM
  • Limited horizon
  • Time invariance the probabilities do not change
    over time
  • The states are hidden because we know the
    structure of the machine (i.e., S and S), but we
    dont know which state sequences generate a
    particular output.

22
Three fundamental questions for HMMs
  1. Finding the probability of an observation
  2. Finding the best state sequence
  3. Training estimating parameters

23
(2) Finding the best state sequence
  • Given the observation O1,To1oT, find the state
    sequence X1,T1X1 XT1 that maximizes P(X1,T1
    O1,T).
  • ? Viterbi algorithm

24
Viterbi algorithm
  • The probability of the best path that produces
    O1,t-1 while ending up in state si

Initialization
Induction
25
Important concepts
  • State vs. class label
  • Assumption P(ti t1i-1) P(ti ti-1)
  • Multiple sequences of states (paths) can lead to
    a given state, but one is the most likely path to
    that state, called the "survivor path".

26
Viterbi search
27
Beam Search
28
Beam search (basic)
29
More options
  • Expanding options TopN, minhyps
  • If hyps_num lt minhyps
  • then use max (topN, minhyps) tags for w_i
  • else use topN tags
  • Pruning options maxhyps, beam, minhyps
  • Keep a hyp iff
  • prob(hyp) beam gt max_prob
  • hyp is among top maxhyps, or
  • hyp is among the top minhyps

30
Beam search
  • Generate m tags for w1, set s1j accordingly
  • For i2 to n (n is the sentence length)
  • Expanding For each surviving sequence s(i-1),j
  • Generate m tags for wi, given s(i-1)j as previous
    tag context
  • Append each tag to s(i-1)j to make a new
    sequence.
  • Pruning
  • Return highest prob sequence sn1.

31
Viterbi vs. Beam search
  • DP vs. heuristic search
  • Global optimal vs. inexact
  • Small window vs. big window for features

32
Additional slides
33
(1) Finding the probability of the observation
  • Forward probability the probability of producing
  • O1,t-1 while ending up in state si

34
Calculating forward probability
Initialization
Induction
Write a Comment
User Comments (0)
About PowerShow.com