Title: Constraint Based Hindi Parser
1Constraint Based Hindi Parser
LTRC, IIIT Hyderabad
2Introduction
- Broad coverage parser
- Very crucial
- IL-IL MT systems, IE, co-reference resolution,
etc.
3Why Dependency ?
- Phrase Structures
- Intrinsically presumes order
- Context Free Grammar (CFG) not well-suited for
free-word order languages (Shieber, 1985) - Particularly ill suited to Indian Languages
- Dependency Structures
- Gives flexibility
- Common structures
- With appropriate labels, closer to Semantics
4Computational Paninian Grammar (CPG)
- Based on Paninis Grammar (500 BC)
- Inspired by Inflectionally rich language
(Sanskrit) - A dependency based analysis
5Computational Paninian Grammar (The Basic
Framework)
- Treats a sentence as a set of modifier-modified
relations - Sentence has a primary modified or the root
(which is generally a verb) - Gives us the framework to identify these
relations - Relations between noun constituent and verb
called karaka - karakas are syntactico-semantic in nature
- Syntactic cues help us in identifying the karakas
6karta karma karaka
- The boy opened the lock
- k1 karta
- k2 karma
- karta, karma usually correspond to agent, theme
- But not always
- karakas are direct participants in the activity
denoted by the verb
open
k1
k2
boy
lock
7Basic karaka relations
- karta agent/doer/force
- Relation label k1
- karma object/patient
- Relation label k2
- karana instrument
- Relation label k3
- sampradaan beneficiary
- Relation label k4
- apaadaan source
- Relation label k5
- adhikarana location in place/time/other
- Relation label k7p/k7t/k7
- For complete list of dependency relations (Begum
et al., 2008)
8Basic karaka relations
raama phala khaataa hai Ram eats fruit
9Basic karaka relations
raama chaaku se saiv kaatataa hai Ram cuts the
apple with knife
10Basic karaka relations
raama ne mohana ko pustaka dii Ram gave a book
to Mohan
11Why Paninian Labels
- Other choices for labels could be
- Grammatical relations
- Subject, Object, etc.
- Behavioral tests (Mohanan, 1994)
- Thematic roles
- Agent, patient, etc.
- No concrete cues
- Difficult to extract them automatically
- Karakas can be computationally exploited
- Syntactically grounded, Semantically loaded
- Gives a level of interface
12Levels of Language Analysis
- Morphological analysis (Morph Info.)
- Analysis in local context (POS tagging)
- Sentence analysis (Chunking, Parsing)
- Semantic analysis (Word sense disambiguation,
etc.) - Discourse processing (Anaphora resolution,
Informational Structure, etc.)
13Example
- rAma ne mohana ko puswaka xI
14Example Parsed Output
xI give
k2
k1
k4
puswaka book
mohana
rAma
15Parser
- Two stage strategy
- Appropriate constraints formed
- Stage I (Intra-clausal relations)
- Dependency relations marked
- Relations such as k1, k2, k3, etc. for each verb
- Stage II (Inter-clausal relations conjunct
relations) - Conjuncts, relative clauses, kriya mula, etc
16Demand Frame for Verb
- A demand frame or karaka frame for a verb
indicates the demands the verb makes - It depends on the verb and its tense, aspect and
modality (TAM) label. - A mapping is specified between karaka relations
and vibhaktis (post-positions, suffix).
17Karaka Frame
- It specifies what karakas are mandatory or
optional for the verb and what vibhaktis
(post-positions) they take respectively - Each verb belongs to a specific verb class
- Each class has a basic karaka frame
- Each TAM specifies a transformation rule
18Example
- rAma mohana ko puswaka xewA hE
xewA hE give is
k2
k1
k4
puswaka book
mohana
rAma
Parsed Dependency Tree
19Transformations
- Based on the TAM of the verb
- rAma ne mohana ko KilOnA xiyA
- rAma ko mohana ko KilOnA xenA padZA
- Appropriate transformation applied
20Example
- rAma ne mohana ko puswaka xI
21Karaka Frame xe (give)
22Transformation Rule yA (TAM)
23Karaka Frame
yA TAM
rAma ne mohana ko KilOnA xiyA
Transformed frame for xe after applying the yA
trasformation
--------------------------------------------------
-------------------------------------- arc-label
necessity vibhakti lextype
src-pos arc-dir -------------------------
--------------------------------------------------
------------- k1 m
ne n l
c k2 m
0ko n l
c k3 d
se n l
c k4 d
ko n l
c --------------------------------------------
--------------------------------------------
0 ? ne
24Parsed Output
xI give
k2
k1
k4
puswaka book
mohana
rAma
25Other frames
26Steps in Parsing
SENTENCE
Morph, POS tagging, Chunking
Identify Demand Groups
Load Frames Transform
Find Candidates
Apply Constraints Solve
Final Parse
27Example
- rAma ne mohana ko KilOnA xiyA
28Identify the demand group,Load and Transform DF
- xiyA
- Only verb
- Transformed frame
- Use yA TAM info.
--------------------------------------------------
-------------------------------------- arc-label
necessity vibhakti lextype
src-pos arc-dir -------------------------
--------------------------------------------------
------------- k1 m
ne n l
c k2 m
0ko n l
c k3 d
se n l
c k4 d
ko n l
c --------------------------------------------
--------------------------------------------
29Candidates
k1
- rAma ne mohana ko KilOnA xiyA _ROOT_
main
k2
k2
k4
30Constraints
- C1 For each of the mandatory demands in a demand
frame for each demand group, there should be
exactly one outgoing edge labeled by the demand
from the demand group. - C2 For each of the optional demands in a demand
frame for each demand group, there should be at
most one outgoing edge labeled by the demand from
the demand group. - C3 There should be exactly one incoming arc into
each source group.
31Constraints
- A parse of a sentence is obtained by satisfying
all the above constraints - Ambiguous sentences have multiple parses
- Ill formed sentences have no parse.
32Parse - I
k1
- rAma ne mohana ko KilOnA xiyA _ROOT_
main
k2
k4
33Parse - I
_ROOT_
main
xiyA
k2
k1
k4
KilOnA
mohana
rAma
34Integer Programming Constraints
- Xijk represents a possible arc from word group i
to j with karaka label k - It takes a value 1 if the solution has that arc
and 0 otherwise. It cannot take any other values. - The constraint rules are formulated into
constraint equations.
35Constraint Equations
- C1 For each demand group i, for each of its
mandatory demands k, the following equalities
must hold - Mik Sj xikj 1
- C2 For each demand group i, for each of its
optional or desirable demands k, the following
inequalities must hold - Oik Sj xikj lt 1
- C3 For each of the source groups j, the
following equalities must hold - Sj Sik xikj 1
36Multiple Frames
- If more than one karaka frame for a verb
- Call Integer Programming package for each frame
- If more than one demand groups (e.g., multiple
verbs) in the sentence with multiple demand
frames - Call Integer Programming package for each
combination of such frames
37Other frames
- Common karaka frame
- Attached to each karaka frame
- Preference given to main frame if there are
clashes - Fallback karaka frame
- required karaka frame is missing
- Graceful degradation
38Stage I Types being handled
- Simple Verbs
- Non-finite verbs
- wA_huA
- wA_hI
- nA
- kara
- 0_rahe, etc.
- Copula
- Genitive
39Example (Complex Sentence)
- rAma ne phala khaakara mohana ko
- Ram ERG fruit having eaten Mohan
DAT - KilOnA xiyA
- toy gave
- Having eaten the fruit Ram gave the toy to
Mohan
40Candidates
X1 k1
- rAma ne phala khaakara mohana ko KilOnA xiyA
_ROOT_
X8 main
X4 k2
X7 vmod
X6 k2
X2 k2
X3 k2
X5 k4
41Constraint Equations
- Verb xe
- Mandatory Demands (C1)
- k1 ? x1 1
- k2 ? x2 x3 x4 1
- Optional Demands (C2)
- k4 ? x5 lt 1
- Verb khaa
- Mandatory Demands (C1)
- k2 ? x6 1
- vmod ? x7 1
- _ROOT_
- C1
- Main ? x8 1
42Constraint Equations (contd.)
- Incoming Arcs into Source (C3)
- rAma
- x1 1
- phala
- x4 x6 1
- khaa
- x7 1
- mohana
- x3 x5 1
- KilOnA
- x2 1
- xe
- x8 1
43Solution Graph
_ROOT_
main
xiyA
k2
k1
k4
vmod
mohana
rAma
KilOnA
khaakara
k2
phala
44References
- Akshar Bharati and Rajeev Sangal. 1993. Parsing
free word order languages in Paninian Framework.
ACL93, Proc.of Annual Meeting of Association of
Computational Linguistics, Association of
Computational Linguistics, New Jersey. USA. - Akshar Bharati, Rajeev Sangal, T Papi Reddy.
2002. A Constraint Based Parser Using Integer
Programming In Proc. of ICON-2002 International
Conference on Natural Language Processing. - Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti
Misra Sharma, Lakshmi Bai and Rajeev Sangal.
2008. Dependency Annotation Scheme for Indian
Languages. In Proceedings of The Third
International Joint Conference on Natural
Language Processing (IJCNLP). Hyderabad, India. - S. M. Shieber. 1985. Evidence against the
context-freeness of natural language. In
Linguistics and Philosophy, p. 8, 334343. - Tara Mohanan, 1994. Arguments in Hindi. CSLI
Publications.
45