Combining Classifiers for Chinese Word Segmentation - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Combining Classifiers for Chinese Word Segmentation

Description:

Convert a manually segmented training corpus into a corpus of tagged characters. Train a maximum entropy tagger on the tagged corpus ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 34
Provided by: IRCS
Category:

less

Transcript and Presenter's Notes

Title: Combining Classifiers for Chinese Word Segmentation


1
Combining Classifiers for Chinese Word
Segmentation
  • Nianwen Xue
  • Institute for Research in Cognitive Science
  • Susan P. Converse
  • Department of Computer and Information Science
  • University of Pennsylvania

2
Organization of the presentation
  • The Chinese word segmentation problem
  • Recasting the problem
  • Supervised machine learning approach combining
    classifiers
  • A maximum entropy tagger
  • A transformation-based tagger
  • Experiments
  • Conclusion and future work

3
The Chinese word segmentation problem
  • Ambiguity (example from Richard Sproat)
  • ?? ?? ?? ? ?
  • Japanese octopus how say
  • "How do you say octopus in Japanese?"
  • ? ?? ? ?? ? ?
  • Japan article fish how say
  • New words
  • Proper names (???(?) Roosevelt (Road))
  • Abbreviations
  • Neologisms (?? cell phone)

4
The Chinese word segmentation problem
A common approach has been to view the problem as
a substring matching problem, and to use the
maximum matching algorithm.
  • This formulation of the problem implies the use
    of a dictionary
  • Difficulties with this approach are
  • Ambiguity a string can map onto different
    sequences of words from a dictionary
  • Words not found in the dictionary being used

5
Organization of the presentation
  • The Chinese word segmentation problem
  • Recasting the problem
  • Supervised machine learning approach combining
    classifiers
  • A maximum entropy tagger
  • A transformation-based tagger
  • Experiments
  • Conclusion and future work

6
A different approach to solving the segmentation
problem
  • Word segmentation is difficult because a given
    Chinese character can occur in different
    positions within a word
  • e.g., ? produce by itself
    ? produce on the left
    ?? product in the middle ???
    productivity on the right ??
    'start production'
  • If we could reliably determine the position of
    each character within its word in a string of
    words, the problem of word segmentation would be
    solved

7
Towards a solution tag the character position
  • Assign a tag to each character in a sentence
    based on the position of the character within a
    word (POC tag)by itself ?/LR ?
    produce on the left ?/LL
    ?? product in the middle ?/MM ???
    productivity on the right ? RR ??
    'start production'
  • Ambiguity arises when a character has multiple
    tags?/LLLR ?/RRLL ?/LLRR ?/RRLR ?/LL
    ?/RR ?/LR ?The task then is to pick the correct
    tag based on the context, in a manner similar to
    the part-of-speech tagging problem.

8
The advantages of this reformulation of the
problem
  • Easier to manipulate than the N-gram rules used
    in other machine-learning approaches
  • Easy to take advantage of new advances in POS
    tagging technology

9
Feasibility
  • Chinese characters are distributed in a
    constrained manner. Some charactersare not
    ambiguous or not ambiguous in all possible ways.
  • Chinese words are generally short, generally
    fewer than four characters

10
Contrast with the dictionary-based formulation
  • Substring ambiguity is turned into a different
    type of ambiguity in which a character can have
    multiple tags.
  • Although new words are common in Chinese since
    some word formation processes are highly
    productive, new characters are less common. More
    likely to see a new combination of characters
    than a new character.

11
Organization of the presentation
  • The Chinese word segmentation problem
  • Recasting the problem
  • Supervised machine learning approach combining
    classifiers
  • A maximum entropy tagger
  • A transformation-based tagger
  • Experiments
  • Conclusion and future work

12
Combining Classifiers for Word Segmentation
  • Supervised learning approach to POC tag
    assignment that combines
  • a Maximum Entropy tagger with
  • a Transformation-based Error-driven tagger
  • Division of labor
  • using the maximum entropy tagger as the main
    workhorse
  • using the transformation-based tagger to clean up
    tagging inconsistencies

13
Training procedure
  • Convert a manually segmented training corpus
    into a corpus of tagged characters
  • Train a maximum entropy tagger on the tagged
    corpus
  • Train a transformation-based tagger using the
    the output tagged by the maxent tagger and using
    the manually segmented corpus as a reference.

14
Example of POC training data
  • A Manually Segmented Sentence
  • ?? ?? ? ?? ?? ?? ???? ,?? ??
  • ?? ?? ?? ?? , ?? ?? ?? ?? ?? ??
  • ? ?? ?
  • A POC-tagged Sequence Automatically Derived from
    the Manual Segmentation
  • ?_LL ?_RR ?_LL ?_RR ?_LR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_MM ?_MM ?_RR ,_LR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ,_LR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR
  • ?_LL ?_RR ?_LR ?_LL ?_RR ?_LR

15
Testing procedure
  • POC tag the testing corpus (different from the
    training corpus) with the maximum entropy tagger
  • Clean up some tagging inconsistencies with the
    transformation-based tagger
  • Convert the output into a segmented corpus
  • There will be problems when there are
    inconsistent tagging sequences, e.g. LL,LL

16
Example of tagging by MaxEnt tagger
  • The same example tagged by the Maximum Entropy
    tagger (note the tagging inconsistency)
  • ?_LL ?_RR ?_LL ?_RR ?_LR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LR
  • ?_LL ,_LR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ,_LR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LR
  • ?_LL ?_RR ?_LR

17
Example after transformations
  • Tagging inconsistency is fixed, but tagging is
    still wrong in this Case
  • ?_LL ?_RR ?_LL ?_RR ?_LR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LR ?_LR ,_LR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_RR ,_LR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_RR ?_LR ?_LL ?_RR

18
Training the maximum entropy tagger (1)
  • Encode contextual information that is useful in
    predicting the tag of a character with features.
  • Examples
  • If the current character is Wi, then it should be
    tagged Ti
  • If the previous character is tagged Ti-1, then
    the current character should be tagged Ti
  • If the previous character is Wi-1 and the current
    character is Wi, then the current character
    should be tagged Ti
  • The features will be instantiations of a
    pre-defined set of feature templates

19
Training the maximum entropy tagger (2)
  • Feature templates used for this tagger
  • The current character
  • The previous (next) character and the current
    character
  • The previous (next) two characters
  • The previous character and the next character
  • The tag of the previous character
  • The tag of the character two before the current
    character
  • The maxent training process effectively assigns a
    'weight' to each feature. The weight indicates
    how effective a feature is in predicting how the
    character should be tagged.

20
Training the transformation-based tagger (Brill
1995)
  • The tagger learns a ranked set of rules from a
    pre-defined set of rule templates
  • After training, the rules it learned are applied
    to the input during testing
  • A sampling of the type of rule templates used to
    learn rules
  • Change tag a to tag b when
  • The preceding (following) character is tagged z.
  • The preceding character is tagged z and the
    following character is tagged as w.
  • The preceding (following) character is c.
  • One of the two preceding (following) characters
    is c.
  • The current character is c and the preceding
    (following) character is tagged z.
  • where a, b, z and w are variables over the set of
    four tags (LL, RR, LR, MM)

21
Top Five Transformations Learned
  • RR MM NEXTTAG RR
  • RR RR ? MM RR
  • LL LR NEXTTAG LL
  • LL LL ? LR LR
  • LL LR NEXTTAG LR
  • MM RR NEXTBIGRAM LR LR
  • RR LR PREVBIGRAM RR LR
  • RR LR RR ? RR LR LR

22
Organization of the presentation
  • The Chinese word segmentation problem
  • Recasting the problem
  • Supervised machine learning approach combining
    classifiers
  • A maximum entropy tagger
  • A transformation-based tagger
  • Experiments
  • Conclusion and future work

23
Three experiments
  • Training corpus size 238K words (405K hanzi)
  • Testing corpus size 13K words (22K hanzi)

24
Experiments I
  • Two sub-experiments
  • A forward maximum-matching algorithm with a
    dictionary compiled from the training corpus.
  • 497 new words in the testing corpus (3.95).
  • The same algorithm with a dictionary compiled
    from both the training andtesting corpora.
  • No new words.

25
Experiment II
  • The maximum entropy tagger only

26
Experiment III
  • Tag the testing corpus with the maximum entropy
    tagger
  • Clean up the output with the transformation-based
    tagger

27
Results
  • tagger(s) Tagging accuracy Segmentation
    accuracy training testing
    testing p() r() f()1a
    n/a n/a 87.34
    92.34 89.771b n/a n/a
    94.51 95.80 95.152
    97.55 95.95 94.90 94.88 94.893
    97.81 96.07 95.21
    95.13 95.171a maximum matching algorithm
    applied to testing data with new words1b
    maximum matching algorithm applied to testing
    data without new words2 maximum entropy
    tagger3 maximum entropy tagger combined with
    the transformation-based tagger

28
Conclusions
  • The maximum entropy model is effective when
    applied to Chinese word segmentation
  • The transformation-based tagger improves the
    tagging accuracy by only .12, but it does clean
    up some tagging inconsistencies. The tagging
    errors are reduced by 3.0. The segmentation
    accuracy improves by .28 (F-score).

29
Residual issues
  • If we use only two tags to label the characters,
    one for characters that start a word and one for
    those that do not, there will be no tagging
    inconsistency. But the segmentation accuracy
    drops to about 94.3
  • Will more training data help? We will try the
    segmenter on the Rocling corpus.

30
Thank You
  • ???????????????????????

31
The Chinese word segmentation problem
  • ?????
  • ? ?? ? ??
  • ?????
  • ?? ? ? ??
  • ?? ? ? ? ??

32
New Results
  • tagger(s) Segmentation accuracy
    testing p() r() f()
  • 87.34 92.34 89.77 94.51 95.80
    95.15 94.90 94.88 94.89 95.21
    95.13 95.17

33
Example
  • The Same Sentence Tagged by the Maximum Entropy
    Tagger (Note the Tagging Inconsistency)
  • ?_LL ?_RR ?_LL ?_RR ?_LR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LR
  • ?_LL ,_LR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ?_LL ?_RR ?_LL ?_RR ?_LL
  • ?_RR ,_LR ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR
  • ?_LL ?_RR ?_LL ?_RR ?_LL ?_RR ?_LR
  • ?_LL ?_RR ?_LR
Write a Comment
User Comments (0)
About PowerShow.com