Transformationbased errordriven learning TBL - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Transformationbased errordriven learning TBL

Description:

for comparing the corpus to the truth. for choosing a transformation. Using TBL (cont) ... for comparing the corpus to the truth: ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 30
Provided by: facultyWa9
Category:

less

Transcript and Presenter's Notes

Title: Transformationbased errordriven learning TBL


1
Transformation-based error-driven learning (TBL)
  • LING 572
  • Fei Xia
  • 1/19/06

2
Outline
  • Basic concept and properties
  • Relation between DT, DL, and TBL
  • Case study

3
Basic concepts and properties
4
TBL overview
  • Introduced by Eric Brill (1992)
  • Intuition
  • Start with some simple solution to the problem
  • Then apply a sequence of transformations
  • Applications
  • Classification problems
  • Other kinds of problems e.g., parsing

5
TBL flowchart
6
Transformations
  • A transformation has two components
  • A trigger environment e.g., the previous tag is
    DT
  • A rewrite rule change the current tag from MD to
    N
  • If (prev_tag T) then MD ? N
  • Similar to a rule in decision tree, but the
    rewrite rule can be complicated (e.g., change a
    parse tree)
  • ? a transformation list is a processor and not
    (just) a classifier.

7
Learning transformations
  • Initialize each example in the training data with
    a classifier
  • Consider all the possible transformations, and
    choose the one with the highest score.
  • Append it to the transformation list and apply it
    to the training corpus to obtain a new corpus.
  • Repeat steps 2-3.
  • ? Steps 2-3 can be expensive. Various ways that
    try to solve the problem.

8
Using TBL
  • The initial state-annotator
  • The space of allowable transformations
  • Rewrite rules
  • Triggering environments
  • The objective function minimize error rate
    directly.
  • for comparing the corpus to the truth
  • for choosing a transformation

9
Using TBL (cont)
  • Two more parameters
  • Whether the effect of a transformation is visible
    to following transformations
  • If so, whats the order in which transformations
    are applied to a corpus?
  • left-to-right
  • right-to-left

10
The order matters
  • Transformation
  • If prev_labelA then change the cur_label from A
    to B.
  • Input A A A A A A
  • Output
  • Not immediate results A B B B B B
  • Immediate results, left-to-right A B A B A B
  • Immediate results, right-to-left A B B B B B

11
Relation between DT, DL, and TBL
12
DT and TBL
  • DT is a subset of TBL
  • (Proof)
  • when depth(DT)1
  • Label with S
  • If X then S ? A
  • S ? B

13
DT is a subset of TBL
Depthn
L1 Label with S L1
L2 Label with S L2
Depthn1
Label with S If X then S ? S S ? S L1 L2
14
DT is a subset of TBL
Label with S If X then S ? S S ? S L1
(renaming X with X) L2 (renaming X with
X) X ? X X ? X
15
DT is a proper subset of TBL
  • There exists a problem that can be solved by TBL
    but not a DT, for a fixed set of primitive
    queries.
  • Ex Given a sequence of characters
  • Classify a char based on its position
  • If pos 4 0 then yes else no
  • Input attributes available previous two chars

16
  • Transformation list
  • Label with S A/S A/S A/S A/S A/S A/S A/S
  • If there is no previous character, then S? F
  • A/F A/S A/S A/S A/S A/S A/S
  • If the char two to the left is labeled ith F,
    then S? F
  • A/F A/S A/F A/S A/F A/S A/F
  • If the char two to the left is labeled with F,
    then F?S
  • A/F A/S A/S A/S A/F A/S A/S
  • F ? yes
  • S ? no

17
DT and TBL
  • TBL is more powerful than DT
  • Extra power of TBL comes from
  • Transformations are applied in sequence
  • Results of previous transformations are visible
    to following transformations.

18
DL and TBL
  • DL is a proper subset of TBL.
  • In two-class TBL
  • (if q then y ? y) ? (if q then y)
  • If multiple transformations apply to an example,
    only the last one matters

19
Two-class TBL ? DL ?
  • Two-class TBL ? DL
  • Replace if q then y?y with if q then y
  • Reverse the rule order
  • DL ? two-class TBL
  • Replace if q then y with if q then y?y
  • Reverse the rule order
  • ? does not hold for dynamic problems
  • Dynamic problem the anwers to questions are not
    static
  • Ex in POS tagging, when the tag of a word is
    changed, it changes the answers to questions for
    nearby words.

20
DT, DL, and TBL (summary)
  • K-DT is a proper subset of k-DL.
  • DL is a proper subset of TBL.
  • Extra power of TBL comes from
  • Transformations are applied in sequence
  • Results of previous transformations are visible
    to following transformations.
  • TBL transforms training data. It does not split
    training data.
  • TBL is a processor, not just a classifier

21
Case study
22
TBL for POS tagging
  • The initial state-annotator most common tag for
    a word.
  • The space of allowable transformations
  • Rewrite rules change cur_tag from X to Y.
  • Triggering environments (feature types)
    unlexicalized or lexicalized

23
Unlexicalized features
  • t-1 is z
  • t-1 or t-2 is z
  • t-1 or t-2 or t-3 is z
  • t-1 is z and t1 is w

24
Lexicalized features
  • w0 is w.
  • w-1 is w
  • w-1 or w-2 is w
  • t-1 is z and w0 is w.

25
TBL for POS tagging (cont)
  • The objective function tagging accuracy
  • for comparing the corpus to the truth
  • For choosing a transformation choose the one
    that results in the greatest error reduction.
  • The order of applying transformations
    left-to-right.
  • The results of applying transformations are not
    visible to other transformations.

26
Learned transformations
27
Experiments
28
Uncovered issues
  • Efficient learning algorithms
  • Probabilistic TBL
  • Top-N hypothesis
  • Confidence measure

29
TBL Summary
  • TBL is more powerful than DL and DT.
  • Transformations are applied in sequence
  • It can handle dynamic problems well.
  • TBL is more than a classifier
  • Classification problems POS tagging
  • Other problems e.g., parsing
  • TBL performs well because it minimizes errors
    directly.
  • Learning can be expensive ? various methods
Write a Comment
User Comments (0)
About PowerShow.com