Title: Partial Dependency Parsing for Irish
1Partial Dependency Parsing for Irish
- Elaine Uí Dhonnchadha Josef Van Genabith
2Aims of the Research
- To be able to parse and/or chunk unrestricted
Irish text - To account for as much of the syntactic
phenomena of Irish as possible in an efficient
and principled way - To use open-source software a far as possible
3Outline of the Talk
- Background
- Stages of Development for Dependency Parser
- Chunker
- Future Work
4Irish Language some facts
- Celtic Language
- Goidelic (Irish, Manx, Scottish Gaelic)
- Brittonic (Breton, Cornish, Welsh)
- Verb Subject Object sentence word order
- Chaith Seán an liathróid.Threw Seán the
ball.V S OSeán threw the ball - Fixed word order
5Irish Language
- Inflectional language
- gender fem/masc
- case common/genitive/vocative
- verbs inflected for number and person
- chuala mé, I heard (analytic)
- chualamar, we heard (synthetic)
- Initial mutation of words
- cailín girl, an chailín the girl
- arán bread, an t-arán the bread
- seachtain week, an tseachtain the week
- bord table, ar an mbord on the table
6Irish Language
- Prepositions inflected for person and number.
- Labhair sé liom faoiSpoke he with-me
about-itHe spoke to me about it - Tabhair dom éGive to-me itGive it to me
- Full paradigm for every preposition
- liom with-meleat with-youleis with-him/it
ETC. ETC.
7Irish Language
- Verbal noun - used in progressives, perfects,
infinitives, etc. - De-verbal nouns bris(v) break, briseadh(vn)
breaking - De-agentive nouns feirmeoir(n) farmer,
feirmeoireacht(vn) farming - Progressive
- Tá mé ag oscailt an doraisIs he at
opening(vn) the door(gen)He is opening the
door - After Perfect
- Tá mé tar_éis an doras a oscailtIs me after
the door PRT opening(vn) I am after opening
the door
8Parsing Methodology
- Dependency Analysis Constituency Analysis
- Dependency Analysis
- Relationships between pairs of words
- Grammatical Functions and Head-Modifier
dependencies - Root and terminal nodes
- Constituency Analysis
- Phrase Structure Rules, e.g. S NP VP
- Hierarchical structure root, phrase categories,
leaf/terminal nodes
9Dependency Analysis
- Issues in the theoretical syntax of Irish on
which there is no clear concensus - The non-adjacency of verb and object in a VSO
language, i.e. difficulties with VP - Some periphrastic aspectual constructions in
Irish, e.g. progressive aspect has more nominal
than verbal characteristics - Dependency Analysis includes semantic as well as
synactic information
10Dependency Parsing
- A dependency analysis looks at dependencies
between pairs of words (which do not have to be
adjacent) in a sentence - The tokens present in the input string are
annotated without introducing any abstract
categories (e.g. phrasal nodes) - i.e. dependency analysis consists of a root, and
leaf nodes, without intermediate levels - Grammatical functions such as subject, object,
predicate, as well as various types of
prepositional phrase, e.g. adverbial, aspectual,
predicative, etc. are annotated - Clauses and head-modifier dependencies are
identified
11Dependency Parsing
- Surface-oriented, bottom-up parsing
- Dependency relations between pairs of tokens
- Grammatical functions
- Head-modifier relations
- Tokens not necessarily adjacent.
V Det N Det N
Bhris an fear a rúitínBroke the man
his ankleThe man broke his ankle
12Previous NLP Work
- Tokenization Morphological Analysis
- Finite-State Morphology (Karttunen, Beesley,
1999 2003) - Finite-State Morphological Analyser Generator
for Irish (Uí Dhonnchadha, 2002) - POS Tagging and Parsing
- Constraint Grammar (CG) Karlsson et al (1995),
- Constraint Grammar Parser CG-2 (Tapanainen,
1996), - VISL CG3 (Bick et al, 2003 ...)
http//visl.sdu.dk - Chunking
- Partial Parsing via Finite-State Cascades (Abney,
1996)
13Stages of Development
- Define the Syntactic Phenomena to include
- Gather Test Data
- Decide on Parsing Methodology
- Decide a Tag-Set for dependency and grammatical
relations - Develop Linguistic Rules for dependency analysis
- Test the rules
- Evaluate the results
14Syntactic Phenomena
- Sources of Information
- Grammar books
- Previous research on aspects of Irish Syntax
- Simple declarative sentences (incl. neg. and
interrogative) - Relative clauses
- Copular constructions
- Non-finite complements
- Adjuncts
15Test Data (Gold Standard)
- Sample Sentences
- Short invented grammatical sentences (225) based
on grammar books etc. - Automatically POS tagged and manually checked and
corrected - Dependency tagged and manually checked and
corrected - Chunked and manually checked and corrected
- Corpus Data
- Corpus data 250 real sentences
- randomly selected from the 3000 sentence Gold
Standard POS Tagged Corpus - Dependency tagged, chunked and manually checked
and corrected
16Tag Set
- Grammatical Functions
- _at_SUBJ, _at_OBJ, _at_FMV, _at_FAUX _at_CLB, etc.
- Unlabelled depedencies
- _at_gtN, _at_Nlt, _at_Plt, etc.
- Start with the _at_ symbol, by convention, to
distinguish them from morphosyntatic tags - Fuair faigh VerbVTIPastIndLen_at_FMV
- This tagset follows the style of tags described
for English (Karlsson, 1995), and for Danish
(Bick, 2003),1 - However, there is not a prescribed list of tags
for CG, which allows us to tailor the tagset to
the language. - 1 Other languages are also detailed on the
VISL website http//visl.sdu.dk/corpus_linguistic
s.html
17Dependency Tags Verbs and Copulas
_at_FMV finite main verb rith 'run'
_at_FMV_SUBJ finite main verb including subject ritheamar 'we ran'
_at_FMV_REL relative finite main verb a chuala mé, 'that I heard'
_at_FMV_REL_SUBJ relative finite main verb incl. subject a chualamar, 'that we heard'
_at_FAUX finite auxiliary verb Tá sé ag cócaireacht 'He is cooking'
_at_FAUX_SUBJ finite auxiliary verb including subject táimid 'we are'
_at_FAUX_REL relative finite auxiliary verb atá siad 'that/which they are'
_at_FAUX_REL_SUBJ relative finite auxiliary verb including subject atáimid 'that/which we are'
_at_COP copula Is
_at_COP_SUBJ copula including subject Seo an fear...'This is the man...'
_at_COP_WH interrogative copula cé leis an leabhar 'whose is the book'
_at_INF bare infinitive Ba mhaith liom fanacht 'I would like to stay'
18Dependency Tags Grammatical Relations
_at_SUBJ subject Chonaic Seán Máire, 'Seán saw Máire'
_at_SUBJ_ASP subject of aspectual phrase bhí sé ag obair 'he was working'
_at_SUBJ_INF subject of infinitive (intrans) an obair a bheith déanta 'the work to be done'
_at_SUBJ_OR_OBJ subject or obj. of relative clause a chonaic an bhean, 'that the woman saw' OR 'that saw the woman'
_at_SUBJ_REL subject of relative clause a rinne sé 'that he made'
_at_OBJ object Chonaic Seán Máire, 'Seán saw Máire'
_at_OBJ_ASP object of aspectual ag déanamh oibre, 'doing work'
_at_OBJ_INF object of infinitive bainne a ól, 'to drink milk'
_at_PRED predicate Tá sé mór 'It is big'
_at_NP unlabelled noun head, e.g. list item, apposition, or fragment 1) dathuithe, 2) leasaithigh, '1) colours, 2) additives'
_at_CC co-ordinating conjunction agus 'and'
_at_CLB clause boundary e.g. agus and when followed by a verb, and subordinating conjs.etc.
19Dependency Tags Head Modifiers (Unlabelled
Dependencies)
_at_gtADJ adverbial particle dependent on the adjective to the right go ciúin 'quietly'
_at_gtN pre-modifier dependent on the first noun to the right an 'the'
_at_gtV pre-verbal particle dependent on a verb to the right ní 'not'
_at_ADVLlt adverbial post modifier
_at_Nlt noun post-modifier teach mór 'big house'
_at_Plt noun dependent on the preceding prep. ag an doras 'at the door'
_at_PClt noun dependent on compound preposition is in genitive case tar éis na Nollag, after Christmas
_at_PNlt pronoun post-mod. é féin 'himself'
_at_PREDlt dependent on predicate Is deas an lá é 'It is a nice day' i.e. Is nice the day it
_at_ADVL adverbial anocht 'tonight'
_at_AUGgtSUBJ augment pronoun dependent on subj. to the right Is é Seán , It/He, Seán is
20Dependency Tags Prepositional Phrases
_at_PP_ADVL head adverbial adjunct ag an doras 'at the door'
_at_PP_ASP head of an aspectual ag rith '(at) running'
_at_PP_HAS at X meaning X has ag Seán, 'Seán has' i.e. at Seán (possession)
_at_PP_NEG negative gan dul 'without going'
_at_PP_OBL oblique PP head do Mháire to Máire
_at_PP_SUBJ prep subj pronoun D'éirigh liom, 'I succeeded' i.e. success was with me'
_at_PP_PRED Predicative Is liom é 'It is mine' i.e. Is with me it (ownership)
_at_PP_STAT stative ina rí 'is a king' i.e. 'in his king(hood)'
21Parsing Methodology Constraint Grammar
- Aims (Karlsson et al., 1995)
- assign the appropriate morphological and
syntactic information according to the context of
each token or larger structure in the text - assign an analysis to every string in the input,
bearing in mind that unrestricted text will
contain typographical errors, non-sentential
fragments, dialectal and colloquial material - if an ambiguity cannot be resolved, the
alternative analyses are retained rather than
forcing a (possibly incorrect) choice
22Constraint Grammar Principles
- Differences between CG and other parsing
methodologies (Karlsson, 1995, p37). - Unlike a context-free grammar, a Constraint
Grammar does not attempt to define the set of
grammatical sentences in a language. - ... everything is licensed which is not
explicitly ruled out - makes it more robust in handling unrestricted
text - Does not aim to produce a minimal set of general
rules a CG grammar can contain many specific
lexically-specific rules to handle special cases. - Doesnt attempt to determine constituency
structure.
23CG Dependency Rules
- MAP (_at_TAG) TARGET (POS) IF (CONDITIONS)
- e.g.
- MAP (_at_FMV) TARGET (Verb) IF (NOT 0 VSYNTH OR
AUX) (NOT -1 RELPART) (NOT -2 RELPART) - SETS
- LIST VSYNTH (Verb 1P) (Verb 2P) (Verb 3P) (Verb
Auto) - LIST AUX ("bí") ("téigh") ("tosaigh")
("tosnaigh") ("féad") ("caith") ("féach") - LIST RELPART (Vb Rel) (Prep Rel)
24Order of Implementation of Rules
- Dependency Analysis is carried out in the
following order - Clause Boundaries
- Verbs and/or Copulas
- Preposition Heads
- All Dependent Modifiers
- Subject
- Predicates of Copular Constructions
- Object(s)
- Adverbials
- Other
25Example (1)
- Fuair faighVerbVTPastIndsé
séPronPers3PSgMascSbjleabhar
leabharNounMascComSgins iPrepArtSgan
anArtSgDefsiopa siopaNounMascComSg
DefArt - Fuair sé leabhar ins an siopaGot he
book in the shopV Pro N Prep
Det N_at_FMV _at_SUBJ _at_OBJ _at_PP_ADVL _at_gtN _at_ltP He
got a book in the shop
26Example (1)
- Fuair sé leabhar ins an siopa
Got he book in the shop V
Pro N Prep Det N _at_FMV _at_SUBJ _at_OBJ
_at_PP_ADVL _at_gtN _at_Plt He got a book in the
shop
27Example (1)
- root Fuair sé leabhar ins an siopa
Got he book in the shop V
Pro N Prep Det N _at_FMV _at_SUBJ _at_OBJ
_at_PP_ADVL _at_gtN _at_ltP He got a book in the
shop
28Example (2)
- Chonaic Máire an fear a bhí ag
itheSaw Máire the man that was at
eatingV N Det N Rel V Prep
VN_at_FMV _at_SUBJ _at_gtN _at_SUBJ_REL _at_gtV _at_FAUX
_at_PP_ASP _at_ltP Máire saw the man that was eating - ag ithePrep VN FORM_at_PP_ASP
_at_ltP FUNCTIONeating
29Development/Test Cycle
30Evaluation of Dependency Analysis
- Sample Sentences 225 short grammatical sentences
Precision (Test Suite)
- Gold Standard Dependency Analysis Corpus
- 250 sentences randomly selected from the 3,000
sentence Gold Standard POS Tagged Corpus
Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences)
Tot Tokens Punct. Tokens Tokens Correct Incorrect F-Score
4403 444 3959 3706 253 93.60
Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences)
Tot Tokens Punct. Tokens Tokens Correct Incorrect F-Score
2555 282 2273 2143 130 94.28
31Chunking
- Using the Dependency Annotations and a Regular
Expression Grammar (implemented using Xerox
Finite-State Tools1) we can identify
phrase-like structures, described by Abney (1991)
as 'chunks'. - 1 For details see http//www.cis.upenn.edu/cis
639/docs/xfst.html
32Implementation
- Regular expressions and Xerox FST
- Chunks
- NP .. , V .. etc.
- PP with embedded NP
- PP .. NP ..
- Conjunction with embedded conjoint
- CJ2 .. ?
- NP úlla CJ2 agus NP oráistí NP
- apples and oranges
- Aspectual phrases
- ASP PP-ASP .. NP .. (OA ..)
- ASP PP-ASP ag NP dúnadh OA an dorais
- closing the door
33Example (3)
- "ltTágt" "bí" Verb VI PresInd _at_FAUX Is"ltségt" "sé"
Pron Pers 3P Sg Masc Sbj _at_SUBJ he"ltaggt" "ag"
Prep Simp _at_PP_ASP at"ltrithgt" "rith" Verbal
Noun VTI _at_Plt running - He is running
- S V Tá bíVerbVIPresInd_at_FAUX NP
sé séPronPers3PSgMascSbj_at_SUBJ NP ASP
PP-ASP ag agPrepSimp_at_PP_ASP NP
rith rithVerbalNounVTI_at_Plt NP PP-ASP
ASP S
34Regular Expession Chunker
- Verb Chunk Dependency Tags
- define VTag _at_FAUX_at_FAUX_REL_at_FMV_at_FMV
_REL - define VSTag _at_FAUX_SUBJ_at_FAUX_REL_SUBJ
_at_FMV_SUBJ_at_FMV_REL_SUBJ - define PreVTag _at_gtV
- Verb Pre Post Modifiers
- define PreVStr TokLemMTag PreVTag SP
- Verb Chunk
- define VStr TokLemMTag VTag SP
- define VChunk PreVStr VStr
- define VChunkBr VChunk _at_-gt "V " ... " "
- Verb_Subject Chunk
- define VSStr TokLemMTag VSTag SP
- define VSChunk PreVStr VSStr
- define VSChunkBr VSChunk _at_-gt "VS " ... " "
35Example (4)
- Tá bíVerbVIPresInd_at_FAUX Is
mé méPronPers1PSg_at_SUBJ_ASP
Iag agPrepSimp_at_PP_ASP atdéanamh déanamh
VerbalNounVTI_at_Plt making cáca cácaNounMascG
enSg_at_OBJ_ASP cake. .PunctFin . - I am making a cake
- S V Tá bíVerbVIPresInd_at_FAUX V NP
mé méPronPers1PSg_at_SBJ_ASP NP ASP
PP-ASP ag agPrepSimp_at_PP_ASP NP
déanamh déanamhVerbalNoun_at_Plt NP PP-ASP
OA cáca cácaNnMscGenSg_at_OBJ_ASP OA - ASP . .PunctFin S
36Corpus Data
- Ach sin an toradh is measa a fhéadfadh tarlú don
pháirtí agus déarfaidís leat nár cóir an iomad
airde a thabhairt do na pobalbhreitheanna nach
raibh riamh fabhrach do na páirtithe beaga.'But
that is the worst possible result for the party
and they would say to you that it is not right to
pay too much attention to the opinion polls that
were never favourable to small parties.
37Dependency Analysis
- S
- CONJ Ach achConjSubord_at_CLB
- COP Sin sinCopProDem_at_COP_SUBJ
- NP an anArtSgDef_at_gtN toradh
toradhNounMscComSgDefArt_at_PRED - is isPartSup_at_gtADJ
- measa olcAdjComp_at_Nlt NP
- VP a aPartVbRelDirect_at_CLB
- fhéadfadh féadVerbVTICondLen_at_FAUX_RE
L - INF tarlú tarlúVerbalNounVTI_at_INF INF
- PP don doPrepArtSg_at_PP_ADVL
- NP pháirtí páirtíNounMascComSgLen_at_P
lt NP PP - CB agus agusConjCoord_at_CLB
- VS déarfaidís abairVerbVTICond3PPl_at_
FMVSUBJ - PP leat lePronPrep2PSg_at_PP_ADVL PP
- COP nár isCopPastRelNeg_at_CLB
- PRED cóir cóirAdjBase_at_PRED
INF an anArtSgDef_at_gtN iomad
iomadSubstNounSg_at_OBJ_INF airde
airdNounFemGenSg_at_Nlt I a
aPrepSimp_at_PP_INF thabhairt
tabhairtVerbalNounVTILen_at_PltI INF PP
do doPrepSimp_at_PP_ADVL NP na
naArtPlDef_at_gtN pobalbhreitheanna
pobalbhreithNounFemComPl_at_Plt
NPPP V nach nachPartVbNegRel_at_CLB
raibh bíVerbPastIndNegLen_at_FMV_REL
PRED riamh riamhAdvIts_at_gtADJ
fabhrach fabhrachAdjBase_at_PRED PP do
doPrepSimp_at_PP_ADVL NP na
naArtPlDef_at_gtN páirtithe
páirtíNounMascComPlDefArt_at_Plt
beaga beagAdjComNotSlenPl_at_Nlt NP PP .
PunctFin S
38Evaluation of Chunker
- Evalb program used to evaluate bracketing of 250
sens. 150 Development Set
Sentences
ALL SENTENCES SENTENCES Lenlt40
Number of sentence 150 Number of sentence 120
Bracketing Recall 96.26 Bracketing Recall 97.31
Bracketing Precision 98.15 Bracketing Precision 98.57
Bracketing F-Measure 97.20 Bracketing F-Measure 97.94
100 Test Set Sentences
ALL SENTENCES SENTENCES Lenlt40
Number of sentence 100 Number of sentence 85
Bracketing Recall 92.89 Bracketing Recall 94.09
Bracketing Precision 94.12 Bracketing Precision 94.09
Bracketing FMeasure 93.50 Bracketing FMeasure 94.09
39Future Work
- Partial Parsing to date as we have not addressed
- Co-ordination
- He packed his clothes and shoes
- He packed his clothes and left
- PP-attachment
- He stabbed the man with the knife
- He stabbed the man with the knife
- PP-function
- locative vs. stative
- adjunct v.s. indirect object
- adding additional info in the FS Lexicons, e.g.
noun sub-classes, subcategorisation frames for
verbs - Irish Text Processing Tools http//www.scss.tcd.i
e/Elaine.UiDhonnchadha/irish.utf8.htm