A Fast Deterministic Parser for Chinese - PowerPoint PPT Presentation

About This Presentation

Title:

A Fast Deterministic Parser for Chinese

Description:

(Shanghai) 7. 7. Deterministic Parsing Model Cont. ... (Shanghai) 14. 14. Classifiers. Classification is the most important part of deterministic parsing. ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 27

Provided by: csSta

Learn more at: https://cs.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Fast Deterministic Parser for Chinese

1
A Fast Deterministic Parser for Chinese

Mengqiu Wang, Kenji Sagae and Teruko Mitamura
Language Technologies Institute
School of Computer Science
Carnegie Mellon University

2
Outline of the talk

Background
Deterministic parsing model
Classifier and feature selection
POS tagging
Experiment and results
Discussion and future work
Conclusion

3
Background

Constituency parsing is one of the most
fundamental tasks in NLP.
State-of-the-art accuracy previously reported in
Chinese constituency parsing achieves precision
and recall in the lower 80 using automatically
generated POS.
Most literature in parsing only reports accuracy,
efficiency is typically ignored
But in reality, parsers are deemed too slow for
many NLP applications (e.g. IR, QA, web-based IX)

4
Deterministic Parsing Model

Originally developed in Sagae and Lavie 2005
for English
Input
Convention in deterministic parsing assumes input
sentences (Chinese in our case) are already
segmented and POS tagged1.
Main Data Structure
A queue, to store input word-POS pairs
A stack, holds partial parse trees
Trees are lexicalized. We used the same
head-finding rules as Bikel 2004
The Parser performs binary Shift-Reduce actions
based on classifier decisions.
Example

1. We perform our own POS tagging based on SVM
5
Deterministic Parsing Model Cont.

Input sentence
??/NR (Brown/Proper Noun) ??/VV (Visits/Verb)
??/NR (Shanghai/Proper Noun)
Initial parser state
Stack T
Queue

(Brown)
(Visits)
(Shanghai)
6
Deterministic Parsing Model Cont.

Classifier output 1 Shift Action
Parser State
Stack
Queue

(Brown)
(Visits)
(Shanghai)
7
Deterministic Parsing Model Cont.

Action 2 Reduce the first item on stack to a NP
node, with node (NR ??) as the head
Parser State
Stack
Queue

(Brown)
(Visits)
(Shanghai)
8
Deterministic Parsing Model Cont.

Action 3 Shift
Parser State
Stack
Queue

(Visits)
(Brown)
(Shanghai)
9
Deterministic Parsing Model Cont.

Action 4 Shift
Parser State
Stack
Queue T

(Visits)
(Shanghai)
(Brown)
10
Deterministic Parsing Model Cont.

Action 5 Reduce the top item on stack to a NP
node, with node (NR ??) as the head
Parser State
Stack
Queue T

(Visits)
(Brown)
(Shanghai)
11
Deterministic Parsing Model Cont.

Action 6 Reduce the top two items on stack to a
VP node, with node (VV ??) as the head
Parser State
Stack
Queue T

VP (VV ??)
NP (NR ??)
(Brown)
NR
(Visits)
??
(Shanghai)
12
Deterministic Parsing Model Cont.

Action 7 Reduce the top two items on stack to an
IP node, take the head node of the VP subtree as
the head -- (VV ??).
Parser State
Stack
Queue T

VP (VV ??)
VP (VV ??)
NP (NR ??)
(Brown)
NR
(Visits)
??
(Shanghai)
13
Deterministic Parsing Model Cont.

Parsing terminates when queue is empty and stack
only contains one item
Final parse tree

VP (VV ??)
VP (VV ??)
NP (NR ??)
(Brown)
NR
(Visits)
??
(Shanghai)
14
Classifiers

Classification is the most important part of
deterministic parsing. It determines constituency
label of each tree node in the final parse tree.
We experimented with four different classifiers
SVM classifier
-- finds a hyper-plane that gives the maximum
soft margin that minimizes the expected risk.
Maximum Entropy Classifier
-- estimates a set of parameters that would
maximize the entropy over distributions that
satisfy certain constraints which force the model
to best account for the training data.
Decision Tree Classifier
-- We used C4.5 Quinlan 1993
Memory-based Learning
-- kNN classifier, Lazy learner, short training
time

15
Features

The features we used are distributionally derived
or linguistically motivated.
Each feature carries information about the
context of a particular parse state.
We denote the top item on the stack as S(1), and
second item (from the top) on the stack as S(2),
and so on. Similarly, we denote the first item on
the queue as Q(1), the second as Q(2), and so on.

16
Features

Boolean features indicating presence of
punctuations, queue emptiness, last parser
action, number of words in constituents,
headwords and POS, root nonterminal symbol,
dependency among tree nodes, tree path
information, relative position.
Rhythmic features Sun and Jurafsky 2004.

17
POS tagging

In our model, POS tagging is treated as a
separate problem and is done prior to parsing.
But we care about the performance of the parser
in realistic situations with automatically
generated POS tags.
We implemented a simple 2-pass POS tagging model
based on SVM, achieved 92.5 accuracy.

18
Experiments

Standard Chinese Treebank data collection
Training set section 1-270 of CTB 2.0 (3484
sentences, 84873 words).
Development set section 301-326 of CTB 2.0
Testing set section 271-300 of CTB 2.0
Total 99629 words, about 1/10 of the size of
English Penn Treebank.
Standard corpus preparation
Empty nodes were removed
Functional label of nonterminal nodes removed.
Eg. NP-Subj -gt NP
For scoring we used the evalb1 program. Labeled
recall, labeled precision and F1 (harmonic mean)
measures are reported.

1. http//nlp.cs.nyu.edu/evalb
19
Results

Comparison of classifiers on development set
using gold-standard POS

20
Classifier Ensemble

Using stacked-classifier techniques, we improved
the performance on the dev set from 86.9 and
87.9 for LR and LP, to 90.3 and 90.5.
a 3.4 improvement in LR and a 2.6 improvement
in LP over the SVM model.

21
Comparison with related work
Results on test set using automatically generated
POS.
22
Comparison with related work cont.

Comparison of parsing speed

23
Discussion and future work

Deterministic parsing framework opens up lots of
opportunities for continuous improvement in
applying machine learning techniques
Eg. Experiment with other classifiers and
classifier ensemble techniques.
Experiment with degree-2 features for Maxent
model, which may give close performance to the
SVM model with a faster speed

24
Conclusion

We presented a first work on deterministic
approach to Chinese constituency parsing.
We achieved comparable results to the
state-of-the-art in Chinese probabilistic
parsing.
We demonstrated deterministic parsing is a viable
approach to fast and accurate Chinese parsing.
Very fast parsing is made possible for
applications that are speed-critical with some
tradeoff in accuracy.
Advances in machine learning techniques can be
directly applied to parsing problem, opens up
lots of opportunities for further improvement

25
Reference

Daniel M. Bikel and David Chiang. 2000. Two
statistical parsing models applied to the Chinese
Treebank. In Proceedings of the Second Chinese
Language Processing Workshop.
Daniel M. Bikel. 2004. On the Parameter Space of
Generative Lexicalized Statistical Parsing
Models. Ph.D. thesis, University of Pennsylvania.
David Chiang and Daniel M. Bikel. 2002.
Recovering latent information in treebanks. In
Proceedings of the 19th International Conference
on Computational Linguistics.
Michael John Collins. 1999. Head-driven
Statistical Models for Natural Langauge Parsing.
Ph.D. thesis, University of Pennsylvania.
Walter Daelemans, Jakub Zavrel, Ko van der Sloot,
and Antal van den Bosch. 2004. Timbl
Tilburgmemory based learner, version 5.1,
reference guide. Technical Report 04-02, ILK
Research Group, Tilburg University.
Pascale Fung, Grace Ngai, Yongsheng Yang, and
Benfeng Chen. 2004. A maximum-entropy Chinese
parser augmented by transformation-based
learning. ACM Transactions on Asian Language
Information Processing, 3(2)159168.
Mary Hearne and Andy Way. 2004. Data-oriented
parsing and the Penn Chinese Treebank. In
Proceedings of the First International Joint
Conference on Natural Language Processing.
Zhengping Jiang. 2004. Statistical Chinese
parsing. Honours thesis, National University of
Singapore.
Zhang Le, 2004. Maximum Entropy Modeling Toolkit
for Python and C. Reference Manual.
Roger Levy and Christopher D. Manning. 2003. Is
it harder to parse Chinese, or the Chinese
Treebank? In Proceedings of the 41st Annual
Meeting of the Association for Computational
Linguistics.
Xiaoqiang Luo. 2003. A maximum entropy Chinese
character-based parser. In Proceedings of the
2003 Conference on Empirical Methods in Natural
Language Processing.
David M. Magerman. 1994. Natural Language Parsing
as Statistical Pattern Recognition. Ph.D. thesis,
Stanford University.
Quinlan,J.R. C4.5 Programs for Machine Learning
Morgan Kauffman, 1993
Kenji Sagae and Alon Lavie. 2005. A
classifier-based parser with linear run-time
complexity. In Proceedings of the Ninth
International Workshop on Parsing Technology.
Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin,
and Yueliang Qian. 2005. Parsing the Penn Chinese
Treebank with semantic knowledge. In
International Joint Conference on Natural
Language Processing 2005.