Hindi Parsing - PowerPoint PPT Presentation

About This Presentation

Title:

Hindi Parsing

Description:

Hindi Parsing Samar Husain LTRC, IIIT-Hyderabad, India. Outline Introduction Grammatical framework Two stage parsing Evaluation Two stage constraint based parsing ... – PowerPoint PPT presentation

Number of Views:252

Avg rating:3.0/5.0

Slides: 38

Provided by: clearColo

Learn more at: https://clear.colorado.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hindi Parsing

1
Hindi Parsing

Samar Husain
LTRC, IIIT-Hyderabad,
India.

2
Outline

Introduction
Grammatical framework
Two stage parsing
Evaluation
Two stage constraint based parsing
Integrated data driven parsing
Two stage data driven parsing

3
Introduction

Broad coverage parser for Hindi
Very crucial
MT systems, IE, co-reference resolution, etc.
Attempt to make a hybrid parser
Grammatical framework Dependency

4
Introduction

Levels of analysis before parsing
Morphological analysis (Morph Info.)
Analysis in local context (POS tagging, Chunking,
case markers/postpositions computation)
We parse after the above processing is done.

5
Computational Paninian Grammar (CPG)

Based on Paninis Grammar
Inspired by inflectionally rich language
(Sanskrit)
A dependency based analysis (Bharati et al.
1995a)
Earlier parsing approaches for Hindi (Bharati et.
al, 1993 1995b 2002)

6
CPG (The Basic Framework)

Treats a sentence as a set of modifier-modified
relations
Sentence has a primary modified or the root
(which is generally a verb)
Gives us the framework to identify these
relations
Relations between noun constituent and verb
called karaka
karakas are syntactico-semantic in nature
Syntactic cues help us in identifying the karakas

7
karta karma karaka

The boy opened the lock
k1 karta
k2 karma
karta, karma usually correspond to agent, theme
respectively
But not always
karakas are direct participants in the activity
denoted by the verb
For complete list of dependency relations (Begum
et al., 2008)

open
k1
k2
boy
lock
8
Hindi Parsing Approaches tried

Two stage constraint based parsing
Data driven parsing
Integrated
Two stage

9
Two stage parsing

Basic idea
There are two layers (stages)
The 1st stage handles intra-clausal relations,
and the 2nd stage handles inter-clausal
relations,
The output of each stage is a linguistically
sound partial parse that becomes the input to the
next layer

10
Stage 1

Identify intra-clausal relations
the argument structure of the verb,
noun-noun genitive relation,
infinitive-verb relation,
infinitive-noun relation,
adjective-noun,
adverb-verb relations,
nominal coordination, etc.

11
Stage 2

Identify inter-clausal relations
subordinating conjuncts,
coordinating conjuncts,
relative clauses, etc.

12
How do we do this?

Introduce a dummy __ROOT__ node as the root of
the dependency tree
Helps in giving linguistically sound partial
parses
Keeps the tree connected
Classify the dependency tags into two sets
Tags that function within a clause,
Tags that relate two clauses

13
An example

mai ghar gayaa kyomki mai bimaar
thaa
I home went because I sick
was
I went home because I was sick

14
The parses
(a) 1st stage output, (b)
2nd stage final parse
15
2 stage parsing

1st stage
All the clauses analyzed
Analyzed clauses become children of __ROOT__
Conjuncts become children of __ROOT__
2nd stage
Does not modify the 1st stage analysis
Identifies relations between 1st stage parsed
sub-trees

16
Important linguistic cues that help Hindi parsing

Nominal postpositions
TAM classes
Morphological features
root of the lexical item, etc.
POS/Chunk tags
Agreement
Minimal semantics
Animate-inanimate
Human-nonhuman

17
Nominal postpositions and TAM

rAma ø mohana ko KilOnA xewA hE
Ram Mohana DAT toy
give
Ram gives a toy to Mohana
rAma ne mohana ko KilOnA xiyA
Ram ERG Mohana DAT toy gave
Ram gave Mohan a toy
rAma ko mohana ko KilOnA xenA
padZA
Ram DAT Mohana DAT toy had
to give
Ram had to give Mohan a toy
The TAM dictates the postposition that appears on
the noun rAma
Related concept in CPG
Verb frames and transformation rules (Bharati et
al., 1995)

18
Agreement

rAma ø mohana ko KilOnA xewA hE
Ram gives a toy to Mohana
kaviwA ø mohana ko KilOnA xewI hE
Kavita gives a toy to Mohana
Verb agrees with rAma and kaviwA
Agreement helps in identifying k1 and k2
But there are some exceptions to this.

19
Evaluation

Two stage constraint based parser
Data driven parsing
Integrated
2 stage

20
Constraint based hybrid parsing

Constraint satisfaction problem (Bharati et al.
2008a)
Hard constraints
Rule based
Soft constraints
ML
Selective resolution of demands
Repair
Partial Parses

21
(No Transcript)
22
Overall performance
UA L LA
CBP 86.1 65 63
CBP 90.1 76.9 75
MST 87.8 72.3 70.4
Malt 86.6 70.6 68.0
UA unlabeled attachments accuracy, L
labeled accuracy LA labeled attachment accuracy
23
Error analysis

Reasons for low LA
Less verb frames
Some phenomena not covered
Prioritization errors

24
Data driven parsing (Integrated)

Tuning Malt and MST for Hyderabad dependency
treebank (Bharati et al., 2008b)
Experiments with different feature
including minimal semantics and agreement

25
Experimental Setup

Data
1800 sentences, average length of 19.85 words,
6585 unique tokens.
training set 1178 sentences
development and test set 352 and 363 sentences

26
Experimental Setup

Parsers
Malt-version 1.0.1 (Nivre et al., 2007)
arc eager
SVM
MST-version 0.4b (McDonald et al., 2005)
Non-projective
No. of highest scoring trees (k)5
Extended feature set for both parsers

27
Consolidated results
28
Error analysis

Reasons for low LA
Difficulty in extracting relevant linguistic cues
Agreement
Similar contextual features Label bias
Non-projectivity
Lack of explicit cues
Long distance dependencies
Complex linguistic phenomena
Less corpus size

29
Observations

Features that proved crucial
TAM (classes) and nominal postpositions
Minimal semantics
Animate-inanimate
Human-nonhuman
Agreement
After making it visible

30
Data driven parsing 2 stage (Bharati et al.,
2009)

MST parser
Non-projective
FEATS nominal and verbal inflections, morph
info.
Data
1492 sentences
Training, development and testing 1200, 100 and
192 respectively.

31
Modular parsing

Intra-clausal and Inter-clausal separately
Introduce a dummy __ROOT__
Parse clauses in 1st stage
Then parse relations between clauses in 2nd stage

32
Comparison with integrated parser
Details Accuracy Accuracy
Full (Stage1 Stage 2) LA 73.42
Full (Stage1 Stage 2) UA 92.22
Full (Stage1 Stage 2) L 75.33
Integrated LA 71.37
Integrated UA 90.60
Integrated L 73.35
There was 2.05, 1.62, 1.98 increase in LA, UA
and L respectively.
33
Evaluation
Details Accuracy Accuracy
Stage1 (Intra-clausal) LA 77.09
Stage1 (Intra-clausal) UA 92.73
Stage1 (Intra-clausal) L 78.70
Stage2 (Inter-clausal) LA 97.84
Stage2 (Inter-clausal) UA 99.67
Stage2 (Inter-clausal) L 98.00
34
Advantages

Learning long distance dependencies becomes easy
Stage 2 specifically learns them efficiently
Few non-projective sentences
Only intra-clausal ones remain
Search space becomes local
Handling complex sentences becomes easy

35
Error analysis

Reasons for low LA (in 1st stage)
Unavailability of explicit cues
Combining modular parsing with minimal semantics
should help
Difficulty in learning complex cues
Agreement
Similar contextual features Label bias
Less corpus size

36
References

R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai,
and R. Sangal. 2008. Dependency annotation scheme
for Indian languages. In Proceedings of
IJCNLP-2008.
A. Bharati and R. Sangal. 1993. Parsing Free Word
Order Languages in the Paninian Framework. Proc.
of ACL93.
A. Bharati, V. Chaitanya and R. Sangal. 1995a.
Natural Language Processing A Paninian
Perspective, Prentice-Hall of India, New Delhi.
A. Bharati, A. Gupta and Rajeev Sangal. 1995b.
Parsing with Nested Constraints. In Proceedings
of 3rd NLP Pacific Rim Symposium. Seoul.
A. Bharati, R. Sangal and T. P. Reddy. 2002. A
Constraint Based Parser Using Integer Programming
In Proc. of ICON-2002.
A. Bharati, S. Husain, D. Sharma, and R. Sangal.
2008a. A two-stage constraint based dependency
parser for free word order languages. In
Proceedings of the COLIPS IALP, Chiang Mai,
Thailand.
A. Bharati, S. Husain, B. Ambati, S. Jain, D.
Sharma, and R. Sangal. 2008b. Two semantic
features make all the difference in parsing
accuracy. In Proceedings of the 6th ICON, Pune,
India.
A. Bharati, S. Husain, S. P. K. Gadde, B. Ambati,
and R. Sangal. 2009. A modular cascaded partial
parsing approach to complete parsing. In
Submission.
R. McDonald, F. Pereira, K. Ribarov, and J.
Hajic. 2005. Non-projective dependency parsing
using spanning tree algorithms. In Proc. of
HLT/EMNLP, pp. 523530.
J. Nivre, J. Hall, J. Nilsson, A. Chanev, G.
Eryigit, S. Kübler, S. Marinov and E Marsi. 2007.
MaltParser A language-independent system for
data-driven dependency parsing. Natural Language
Engineering, 13(2), 95-135.