Title: Hindi Parsing
1Hindi Parsing
- Samar Husain
- LTRC, IIIT-Hyderabad,
- India.
2Outline
- Introduction
- Grammatical framework
- Two stage parsing
- Evaluation
- Two stage constraint based parsing
- Integrated data driven parsing
- Two stage data driven parsing
3Introduction
- Broad coverage parser for Hindi
- Very crucial
- MT systems, IE, co-reference resolution, etc.
- Attempt to make a hybrid parser
- Grammatical framework Dependency
4Introduction
- Levels of analysis before parsing
- Morphological analysis (Morph Info.)
- Analysis in local context (POS tagging, Chunking,
case markers/postpositions computation) - We parse after the above processing is done.
5Computational Paninian Grammar (CPG)
- Based on Paninis Grammar
- Inspired by inflectionally rich language
(Sanskrit) - A dependency based analysis (Bharati et al.
1995a) - Earlier parsing approaches for Hindi (Bharati et.
al, 1993 1995b 2002)
6CPG (The Basic Framework)
- Treats a sentence as a set of modifier-modified
relations - Sentence has a primary modified or the root
(which is generally a verb) - Gives us the framework to identify these
relations - Relations between noun constituent and verb
called karaka - karakas are syntactico-semantic in nature
- Syntactic cues help us in identifying the karakas
7karta karma karaka
- The boy opened the lock
- k1 karta
- k2 karma
- karta, karma usually correspond to agent, theme
respectively - But not always
- karakas are direct participants in the activity
denoted by the verb - For complete list of dependency relations (Begum
et al., 2008)
open
k1
k2
boy
lock
8Hindi Parsing Approaches tried
- Two stage constraint based parsing
- Data driven parsing
- Integrated
- Two stage
9Two stage parsing
- Basic idea
- There are two layers (stages)
- The 1st stage handles intra-clausal relations,
and the 2nd stage handles inter-clausal
relations, - The output of each stage is a linguistically
sound partial parse that becomes the input to the
next layer
10Stage 1
- Identify intra-clausal relations
- the argument structure of the verb,
- noun-noun genitive relation,
- infinitive-verb relation,
- infinitive-noun relation,
- adjective-noun,
- adverb-verb relations,
- nominal coordination, etc.
11Stage 2
- Identify inter-clausal relations
- subordinating conjuncts,
- coordinating conjuncts,
- relative clauses, etc.
12How do we do this?
- Introduce a dummy __ROOT__ node as the root of
the dependency tree - Helps in giving linguistically sound partial
parses - Keeps the tree connected
- Classify the dependency tags into two sets
- Tags that function within a clause,
- Tags that relate two clauses
13An example
- mai ghar gayaa kyomki mai bimaar
thaa - I home went because I sick
was - I went home because I was sick
14The parses
(a) 1st stage output, (b)
2nd stage final parse
152 stage parsing
- 1st stage
- All the clauses analyzed
- Analyzed clauses become children of __ROOT__
- Conjuncts become children of __ROOT__
- 2nd stage
- Does not modify the 1st stage analysis
- Identifies relations between 1st stage parsed
sub-trees
16Important linguistic cues that help Hindi parsing
- Nominal postpositions
- TAM classes
- Morphological features
- root of the lexical item, etc.
- POS/Chunk tags
- Agreement
- Minimal semantics
- Animate-inanimate
- Human-nonhuman
17Nominal postpositions and TAM
- rAma ø mohana ko KilOnA xewA hE
- Ram Mohana DAT toy
give - Ram gives a toy to Mohana
- rAma ne mohana ko KilOnA xiyA
- Ram ERG Mohana DAT toy gave
- Ram gave Mohan a toy
- rAma ko mohana ko KilOnA xenA
padZA - Ram DAT Mohana DAT toy had
to give - Ram had to give Mohan a toy
- The TAM dictates the postposition that appears on
the noun rAma - Related concept in CPG
- Verb frames and transformation rules (Bharati et
al., 1995)
18Agreement
- rAma ø mohana ko KilOnA xewA hE
- Ram gives a toy to Mohana
- kaviwA ø mohana ko KilOnA xewI hE
- Kavita gives a toy to Mohana
- Verb agrees with rAma and kaviwA
- Agreement helps in identifying k1 and k2
- But there are some exceptions to this.
19Evaluation
- Two stage constraint based parser
- Data driven parsing
- Integrated
- 2 stage
20Constraint based hybrid parsing
- Constraint satisfaction problem (Bharati et al.
2008a) - Hard constraints
- Rule based
- Soft constraints
- ML
- Selective resolution of demands
- Repair
- Partial Parses
21(No Transcript)
22Overall performance
UA L LA
CBP 86.1 65 63
CBP 90.1 76.9 75
MST 87.8 72.3 70.4
Malt 86.6 70.6 68.0
UA unlabeled attachments accuracy, L
labeled accuracy LA labeled attachment accuracy
23Error analysis
- Reasons for low LA
- Less verb frames
- Some phenomena not covered
- Prioritization errors
24Data driven parsing (Integrated)
- Tuning Malt and MST for Hyderabad dependency
treebank (Bharati et al., 2008b) - Experiments with different feature
- including minimal semantics and agreement
25Experimental Setup
- Data
- 1800 sentences, average length of 19.85 words,
6585 unique tokens. - training set 1178 sentences
- development and test set 352 and 363 sentences
26Experimental Setup
- Parsers
- Malt-version 1.0.1 (Nivre et al., 2007)
- arc eager
- SVM
- MST-version 0.4b (McDonald et al., 2005)
- Non-projective
- No. of highest scoring trees (k)5
- Extended feature set for both parsers
27Consolidated results
28Error analysis
- Reasons for low LA
- Difficulty in extracting relevant linguistic cues
- Agreement
- Similar contextual features Label bias
- Non-projectivity
- Lack of explicit cues
- Long distance dependencies
- Complex linguistic phenomena
- Less corpus size
29Observations
- Features that proved crucial
- TAM (classes) and nominal postpositions
- Minimal semantics
- Animate-inanimate
- Human-nonhuman
- Agreement
- After making it visible
30Data driven parsing 2 stage (Bharati et al.,
2009)
- MST parser
- Non-projective
- FEATS nominal and verbal inflections, morph
info. - Data
- 1492 sentences
- Training, development and testing 1200, 100 and
192 respectively.
31Modular parsing
- Intra-clausal and Inter-clausal separately
- Introduce a dummy __ROOT__
- Parse clauses in 1st stage
- Then parse relations between clauses in 2nd stage
32Comparison with integrated parser
Details Accuracy Accuracy
Full (Stage1 Stage 2) LA 73.42
Full (Stage1 Stage 2) UA 92.22
Full (Stage1 Stage 2) L 75.33
Integrated LA 71.37
Integrated UA 90.60
Integrated L 73.35
There was 2.05, 1.62, 1.98 increase in LA, UA
and L respectively.
33Evaluation
Details Accuracy Accuracy
Stage1 (Intra-clausal) LA 77.09
Stage1 (Intra-clausal) UA 92.73
Stage1 (Intra-clausal) L 78.70
Stage2 (Inter-clausal) LA 97.84
Stage2 (Inter-clausal) UA 99.67
Stage2 (Inter-clausal) L 98.00
34Advantages
- Learning long distance dependencies becomes easy
- Stage 2 specifically learns them efficiently
- Few non-projective sentences
- Only intra-clausal ones remain
- Search space becomes local
- Handling complex sentences becomes easy
35Error analysis
- Reasons for low LA (in 1st stage)
- Unavailability of explicit cues
- Combining modular parsing with minimal semantics
should help - Difficulty in learning complex cues
- Agreement
- Similar contextual features Label bias
- Less corpus size
36References
- R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai,
and R. Sangal. 2008. Dependency annotation scheme
for Indian languages. In Proceedings of
IJCNLP-2008. - A. Bharati and R. Sangal. 1993. Parsing Free Word
Order Languages in the Paninian Framework. Proc.
of ACL93. - A. Bharati, V. Chaitanya and R. Sangal. 1995a.
Natural Language Processing A Paninian
Perspective, Prentice-Hall of India, New Delhi. - A. Bharati, A. Gupta and Rajeev Sangal. 1995b.
Parsing with Nested Constraints. In Proceedings
of 3rd NLP Pacific Rim Symposium. Seoul. - A. Bharati, R. Sangal and T. P. Reddy. 2002. A
Constraint Based Parser Using Integer Programming
In Proc. of ICON-2002. - A. Bharati, S. Husain, D. Sharma, and R. Sangal.
2008a. A two-stage constraint based dependency
parser for free word order languages. In
Proceedings of the COLIPS IALP, Chiang Mai,
Thailand. - A. Bharati, S. Husain, B. Ambati, S. Jain, D.
Sharma, and R. Sangal. 2008b. Two semantic
features make all the difference in parsing
accuracy. In Proceedings of the 6th ICON, Pune,
India. - A. Bharati, S. Husain, S. P. K. Gadde, B. Ambati,
and R. Sangal. 2009. A modular cascaded partial
parsing approach to complete parsing. In
Submission. - R. McDonald, F. Pereira, K. Ribarov, and J.
Hajic. 2005. Non-projective dependency parsing
using spanning tree algorithms. In Proc. of
HLT/EMNLP, pp. 523530. - J. Nivre, J. Hall, J. Nilsson, A. Chanev, G.
Eryigit, S. Kübler, S. Marinov and E Marsi. 2007.
MaltParser A language-independent system for
data-driven dependency parsing. Natural Language
Engineering, 13(2), 95-135.
37