Title: A Classifier-based Deterministic Parser for Chinese
1A Classifier-based Deterministic Parser for
Chinese
- -- Mengqiu Wang
- Advisor Prof. Teruko Mitamura
- Joint work with Kenji Sagae
2Outline of the talk
- Background
- Deterministic parsing model
- Classifier and feature selection
- POS tagging
- Experiment and results
- Discussion and future work
- Conclusion
3Background
- Constituency parsing is one of the most
fundamental tasks in NLP. - State-of-the-art accuracy in Chinese constituency
parsing achieves precision and recall in the
lower 80 using automatically generated POS. - Most literature in parsing only reports accuracy,
efficiency is typically ignored - But in reality, parsers are deemed too slow for
many NLP applications (e.g. IR)
4Deterministic Parsing Model
- Originally developed in Sagae and Lavie 2005
- Input
- Convention in deterministic parsing assumes input
sentences (Chinese in our case) are already
segmented and POS tagged1. - Main Data Structure
- A queue, to store input word-POS pairs
- A stack, holds partial parse trees
- Trees are lexicalized. We used the same
head-finding rules as Bikel 2004 - The Parser performs binary Shift-Reduce actions
based on classifier decisions. - Example
1. We perform our own POS tagging based on SVM
5Deterministic Parsing Model Cont.
- Input sentence
- ??/NR (Brown/Proper Noun) ??/VV (Visits/Verb)
- ??/NR (Shanghai/Proper Noun)
- Initial parser state
- Stack T
- Queue
6Deterministic Parsing Model Cont.
- Action 1 Shift
- Parser State
- Stack
-
- Queue
7Deterministic Parsing Model Cont.
- Action 2 Reduce the first item on stack to a NP
node, with node (NR ??) as the head - Parser State
- Stack
-
-
-
- Queue
8Deterministic Parsing Model Cont.
- Action 3 Shift
- Parser State
- Stack
-
-
-
- Queue
9Deterministic Parsing Model Cont.
- Action 4 Shift
- Parser State
- Stack
-
-
-
- Queue T
10Deterministic Parsing Model Cont.
- Action 5 Reduce the first item on stack to a NP
node, with node (NR ??) as the head - Parser State
- Stack
-
-
-
- Queue T
11Deterministic Parsing Model Cont.
- Action 6 Reduce the first two items on stack to
a VP node, with node (VV ??) as the head - Parser State
- Stack
-
-
-
- Queue T
VP (VV ??)
NP (NR ??)
NR
??
12Deterministic Parsing Model Cont.
- Action 7 Reduce the first two items on stack to
an IP node, take the head node of the VP subtree
as the head -- (VV ??). - Parser State
- Stack
-
-
-
- Queue T
VP (VV ??)
VP (VV ??)
NP (NR ??)
NR
??
13Deterministic Parsing Model Cont.
- Parsing terminates when queue is empty and stack
only contains one item - Final parse tree
-
-
-
-
VP (VV ??)
VP (VV ??)
NP (NR ??)
NR
??
14Classifiers
- Classification is the most important part of
deterministic parsing. - We experimented with four different classifiers
- SVM classifier
- finds a hyper-plane that gives the maximum soft
margin that minimizes the expected risk. - Maximum Entropy Classifier
- estimates a set of parameters that would
maximize the entropy over distributions that
satisfy certain constraints which force the model
to best account for the training data. - Decision Tree Classifier
- We used C4.5
- Memory-based Learning
- kNN classifier, Lazy learner, short training
time, ideal for prototyping.
15Features
- The features we used are distributionally derived
or linguistically motivated. - Each feature carries information about the
context of a particular parse state. - We denote the top item on the stack as S(1), and
second item (from the top) on the stack as S(2),
and so on. Similarly, we denote the first item on
the queue as Q(1), the second as Q(2), and so on.
16Features
- A Boolean feature indicates if a closing
punctuation is expected or not. - A Boolean value indicates if the queue is empty
or not. - A Boolean feature indicates whether there is a
comma separating S(1) and S(2) or not. - Last action given by the classifier, and number
of words in S(1) and S(2). - Headword and its POS of S(1), S(2), S(3) and
S(4), and word and POS of Q(1), Q(2), Q(3) and
Q(4). - Nonterminal label of the root of S(1) and S(2),
and number of punctuations in S(1) and S(2). - Rhythmic features and the linear distance between
the head-words of the S(1) and S(2). - Number of words found so far to be dependents of
the head-words of S(1) and S(2). - Nonterminal label, POS and headword of the
immediate left and right child of the root of
S(1) and S(2). - Most recently found word and POS pair that is to
the left of the head-word of S(1) and S(2). - Most recently found word and POS pair that is to
the right of the head-word of S(1) and S(2).
17POS tagging
- In our model, POS tagging is treated as a
separate problem and is done prior to parsing. - But we care about the performance of the parser
in realistic situations with automatically
generated POS tags. - We implemented a simple 2-pass POS tagging model
based on SVM, achieved 92.5 accuracy.
18Experiments
- Standard data collection
- Training set section 1-270 of the Penn Chinese
Treebank (3484 sentences, 84873 words). - Development set section 301-326
- Testing set section 271-300
- Total 99629 words, about 1/10 of the size of
English Penn Treebank. - Standard corpus preparation
- Empty nodes were removed
- Functional label of nonterminal nodes removed.
- Eg. NP-Subj -gt NP
- For scoring we used the evalb1 program. Labeled
recall, labeled precision and F1 (harmonic mean)
measures are reported.
1. http//nlp.cs.nyu.edu/evalb
19Results
- Comparison of classifiers on development set
using gold-standard POS -
classification Parsing Accuracy Parsing Accuracy Parsing Accuracy Parsing Accuracy Parsing Accuracy
Model Accuracy LR LP F1 Fail Time
SVM 94.3 86.9 87.9 87.4 0 3m 19s
Maxent 92.6 84.1 85.2 84.6 5 0m 21s
DTree1 92.0 78.8 80.3 79.5 42 0m 12s
DTree2 N/A 81.6 83.6 82.6 30 0m 18s
MBL 90.6 74.3 75.2 74.7 2 16m 11s
20Classifier Ensemble
- Using stacked-classifier techniques, we improved
the performance on the dev set to 90.3 LR and LP
of 90.5, which is a 3.4 improvement in LR and a
2.6 improvement in LP over the SVM model.
21Comparison with related work
Results on test set using automatically generated
POS.
22Comparison with related work cont.
- Comparison of parsing speed
Model Runtime
Bikel 54m 6s
Levy Manning 8m 12s
DTree 0m 14s
Maxent 0m 24s
SVM 3m 50s
23Discussion and future work
- Among the classifiers, SVM has high accuracy but
low speed DTree has lower accuracy but great
speed Maxent sits in between these two in terms
of accuracy and speed. - It is desirable to bring the two ends of the
spectrum closer, ie. increase the accuracy of
DTree classifier, lower the computational cost of
SVM classification. - Action items
- Apply boosting techniques (Adaboost, random
forest, bagging, etc.) to DTree. (Preliminary
attempt didnt yield better performance, calls
for further investigation). - Feature selection (especially on lexical items)
to reduce computational cost of classification - Re-implement the parser in C (avoid invoking
external processes and expensive I/O
24Conclusion
- Implemented a classifier based deterministic
constituency parser for Chinese - We achieved comparable results to the
state-of-the-art in Chinese parsing - Very fast parsing is made possible for
applications that are speed-critical with some
tradeoff in accuracy. - Advances in machine learning techniques can be
directly applied to parsing problem, opens up
lots of opportunities for further improvement
25Reference
- Daniel M. Bikel and David Chiang. 2000. Two
statistical parsing models applied to the Chinese
Treebank. In Proceedings of the Second Chinese
Language Processing Workshop. - Daniel M. Bikel. 2004. On the Parameter Space of
Generative Lexicalized Statistical Parsing
Models. Ph.D. thesis, University of Pennsylvania. - David Chiang and Daniel M. Bikel. 2002.
Recovering latent information in treebanks. In
Proceedings of the 19th International Conference
on Computational Linguistics. - Michael John Collins. 1999. Head-driven
Statistical Models for Natural Langauge Parsing.
Ph.D. thesis, University of Pennsylvania. - Walter Daelemans, Jakub Zavrel, Ko van der Sloot,
and Antal van den Bosch. 2004. Timbl
Tilburgmemory based learner, version 5.1,
reference guide. Technical Report 04-02, ILK
Research Group, Tilburg University. - Pascale Fung, Grace Ngai, Yongsheng Yang, and
Benfeng Chen. 2004. A maximum-entropy Chinese
parser augmented by transformation-based
learning. ACM Transactions on Asian Language
Information Processing, 3(2)159168. - Mary Hearne and Andy Way. 2004. Data-oriented
parsing and the Penn Chinese Treebank. In
Proceedings of the First International Joint
Conference on Natural Language Processing. - Zhengping Jiang. 2004. Statistical Chinese
parsing. Honours thesis, National University of
Singapore. - Zhang Le, 2004. Maximum Entropy Modeling Toolkit
for Python and C. Reference Manual. - Roger Levy and Christopher D. Manning. 2003. Is
it harder to parse Chinese, or the Chinese
Treebank? In Proceedings of the 41st Annual
Meeting of the Association for Computational
Linguistics. - Xiaoqiang Luo. 2003. A maximum entropy Chinese
character-based parser. In Proceedings of the
2003 Conference on Empirical Methods in Natural
Language Processing. - David M. Magerman. 1994. Natural Language Parsing
as Statistical Pattern Recognition. Ph.D. thesis,
Stanford University. - Kenji Sagae and Alon Lavie. 2005. A
classifier-based parser with linear run-time
complexity. In Proceedings of the Ninth
International Workshop on Parsing Technology. - Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin,
and Yueliang Qian. 2005. Parsing the Penn Chinese
Treebank with semantic knowledge. In
International Joint Conference on Natural
Language Processing 2005.
26Thank you!