Partial Dependency Parsing for Irish - PowerPoint PPT Presentation

About This Presentation
Title:

Partial Dependency Parsing for Irish

Description:

Partial Dependency Parsing for Irish Elaine U Dhonnchadha & Josef Van Genabith Aims of the Research To be able to parse and/or chunk unrestricted Irish text To ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 40
Provided by: uidh
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Partial Dependency Parsing for Irish


1
Partial Dependency Parsing for Irish
  • Elaine Uí Dhonnchadha Josef Van Genabith

2
Aims of the Research
  • To be able to parse and/or chunk unrestricted
    Irish text
  • To account for as much of the syntactic
    phenomena of Irish as possible in an efficient
    and principled way
  • To use open-source software a far as possible

3
Outline of the Talk
  • Background
  • Stages of Development for Dependency Parser
  • Chunker
  • Future Work

4
Irish Language some facts
  • Celtic Language
  • Goidelic (Irish, Manx, Scottish Gaelic)
  • Brittonic (Breton, Cornish, Welsh)
  • Verb Subject Object sentence word order
  • Chaith Seán an liathróid.Threw Seán the
    ball.V S OSeán threw the ball
  • Fixed word order

5
Irish Language
  • Inflectional language
  • gender fem/masc
  • case common/genitive/vocative
  • verbs inflected for number and person
  • chuala mé, I heard (analytic)
  • chualamar, we heard (synthetic)
  • Initial mutation of words
  • cailín girl, an chailín the girl
  • arán bread, an t-arán the bread
  • seachtain week, an tseachtain the week
  • bord table, ar an mbord on the table

6
Irish Language
  • Prepositions inflected for person and number.
  • Labhair sé liom faoiSpoke he with-me
    about-itHe spoke to me about it
  • Tabhair dom éGive to-me itGive it to me
  • Full paradigm for every preposition
  • liom with-meleat with-youleis with-him/it
    ETC. ETC.

7
Irish Language
  • Verbal noun - used in progressives, perfects,
    infinitives, etc.
  • De-verbal nouns bris(v) break, briseadh(vn)
    breaking
  • De-agentive nouns feirmeoir(n) farmer,
    feirmeoireacht(vn) farming
  • Progressive
  • Tá mé ag oscailt an doraisIs he at
    opening(vn) the door(gen)He is opening the
    door
  • After Perfect
  • Tá mé tar_éis an doras a oscailtIs me after
    the door PRT opening(vn) I am after opening
    the door

8
Parsing Methodology
  • Dependency Analysis Constituency Analysis
  • Dependency Analysis
  • Relationships between pairs of words
  • Grammatical Functions and Head-Modifier
    dependencies
  • Root and terminal nodes
  • Constituency Analysis
  • Phrase Structure Rules, e.g. S NP VP
  • Hierarchical structure root, phrase categories,
    leaf/terminal nodes

9
Dependency Analysis
  • Issues in the theoretical syntax of Irish on
    which there is no clear concensus
  • The non-adjacency of verb and object in a VSO
    language, i.e. difficulties with VP
  • Some periphrastic aspectual constructions in
    Irish, e.g. progressive aspect has more nominal
    than verbal characteristics
  • Dependency Analysis includes semantic as well as
    synactic information

10
Dependency Parsing
  • A dependency analysis looks at dependencies
    between pairs of words (which do not have to be
    adjacent) in a sentence
  • The tokens present in the input string are
    annotated without introducing any abstract
    categories (e.g. phrasal nodes)
  • i.e. dependency analysis consists of a root, and
    leaf nodes, without intermediate levels
  • Grammatical functions such as subject, object,
    predicate, as well as various types of
    prepositional phrase, e.g. adverbial, aspectual,
    predicative, etc. are annotated
  • Clauses and head-modifier dependencies are
    identified

11
Dependency Parsing
  • Surface-oriented, bottom-up parsing
  • Dependency relations between pairs of tokens
  • Grammatical functions
  • Head-modifier relations
  • Tokens not necessarily adjacent.

V Det N Det N
Bhris an fear a rúitínBroke the man
his ankleThe man broke his ankle
12
Previous NLP Work
  • Tokenization Morphological Analysis
  • Finite-State Morphology (Karttunen, Beesley,
    1999 2003)
  • Finite-State Morphological Analyser Generator
    for Irish (Uí Dhonnchadha, 2002)
  • POS Tagging and Parsing
  • Constraint Grammar (CG) Karlsson et al (1995),
  • Constraint Grammar Parser CG-2 (Tapanainen,
    1996),
  • VISL CG3 (Bick et al, 2003 ...)
    http//visl.sdu.dk
  • Chunking
  • Partial Parsing via Finite-State Cascades (Abney,
    1996)

13
Stages of Development
  • Define the Syntactic Phenomena to include
  • Gather Test Data
  • Decide on Parsing Methodology
  • Decide a Tag-Set for dependency and grammatical
    relations
  • Develop Linguistic Rules for dependency analysis
  • Test the rules
  • Evaluate the results

14
Syntactic Phenomena
  • Sources of Information
  • Grammar books
  • Previous research on aspects of Irish Syntax
  • Simple declarative sentences (incl. neg. and
    interrogative)
  • Relative clauses
  • Copular constructions
  • Non-finite complements
  • Adjuncts

15
Test Data (Gold Standard)
  • Sample Sentences
  • Short invented grammatical sentences (225) based
    on grammar books etc.
  • Automatically POS tagged and manually checked and
    corrected
  • Dependency tagged and manually checked and
    corrected
  • Chunked and manually checked and corrected
  • Corpus Data
  • Corpus data 250 real sentences
  • randomly selected from the 3000 sentence Gold
    Standard POS Tagged Corpus
  • Dependency tagged, chunked and manually checked
    and corrected

16
Tag Set
  • Grammatical Functions
  • _at_SUBJ, _at_OBJ, _at_FMV, _at_FAUX _at_CLB, etc.
  • Unlabelled depedencies
  • _at_gtN, _at_Nlt, _at_Plt, etc.
  • Start with the _at_ symbol, by convention, to
    distinguish them from morphosyntatic tags
  • Fuair faigh VerbVTIPastIndLen_at_FMV
  • This tagset follows the style of tags described
    for English (Karlsson, 1995), and for Danish
    (Bick, 2003),1
  • However, there is not a prescribed list of tags
    for CG, which allows us to tailor the tagset to
    the language.
  • 1 Other languages are also detailed on the
    VISL website http//visl.sdu.dk/corpus_linguistic
    s.html

17
Dependency Tags Verbs and Copulas
_at_FMV finite main verb rith 'run'
_at_FMV_SUBJ finite main verb including subject ritheamar 'we ran'
_at_FMV_REL relative finite main verb a chuala mé, 'that I heard'
_at_FMV_REL_SUBJ relative finite main verb incl. subject a chualamar, 'that we heard'
_at_FAUX finite auxiliary verb Tá sé ag cócaireacht 'He is cooking'
_at_FAUX_SUBJ finite auxiliary verb including subject táimid 'we are'
_at_FAUX_REL relative finite auxiliary verb atá siad 'that/which they are'
_at_FAUX_REL_SUBJ relative finite auxiliary verb including subject atáimid 'that/which we are'
_at_COP copula Is
_at_COP_SUBJ copula including subject Seo an fear...'This is the man...'
_at_COP_WH interrogative copula cé leis an leabhar 'whose is the book'
_at_INF bare infinitive Ba mhaith liom fanacht 'I would like to stay'
18
Dependency Tags Grammatical Relations
_at_SUBJ subject Chonaic Seán Máire, 'Seán saw Máire'
_at_SUBJ_ASP subject of aspectual phrase bhí sé ag obair 'he was working'
_at_SUBJ_INF subject of infinitive (intrans) an obair a bheith déanta 'the work to be done'
_at_SUBJ_OR_OBJ subject or obj. of relative clause a chonaic an bhean, 'that the woman saw' OR 'that saw the woman'
_at_SUBJ_REL subject of relative clause a rinne sé 'that he made'
_at_OBJ object Chonaic Seán Máire, 'Seán saw Máire'
_at_OBJ_ASP object of aspectual ag déanamh oibre, 'doing work'
_at_OBJ_INF object of infinitive bainne a ól, 'to drink milk'
_at_PRED predicate Tá sé mór 'It is big'
_at_NP unlabelled noun head, e.g. list item, apposition, or fragment 1) dathuithe, 2) leasaithigh, '1) colours, 2) additives'
_at_CC co-ordinating conjunction agus 'and'
_at_CLB clause boundary e.g. agus and when followed by a verb, and subordinating conjs.etc.
19
Dependency Tags Head Modifiers (Unlabelled
Dependencies)
_at_gtADJ adverbial particle dependent on the adjective to the right go ciúin 'quietly'
_at_gtN pre-modifier dependent on the first noun to the right an 'the'
_at_gtV pre-verbal particle dependent on a verb to the right ní 'not'
_at_ADVLlt adverbial post modifier
_at_Nlt noun post-modifier teach mór 'big house'
_at_Plt noun dependent on the preceding prep. ag an doras 'at the door'
_at_PClt noun dependent on compound preposition is in genitive case tar éis na Nollag, after Christmas
_at_PNlt pronoun post-mod. é féin 'himself'
_at_PREDlt dependent on predicate Is deas an lá é 'It is a nice day' i.e. Is nice the day it
_at_ADVL adverbial anocht 'tonight'
_at_AUGgtSUBJ augment pronoun dependent on subj. to the right Is é Seán , It/He, Seán is
20
Dependency Tags Prepositional Phrases
_at_PP_ADVL head adverbial adjunct ag an doras 'at the door'
_at_PP_ASP head of an aspectual ag rith '(at) running'
_at_PP_HAS at X meaning X has ag Seán, 'Seán has' i.e. at Seán (possession)
_at_PP_NEG negative gan dul 'without going'
_at_PP_OBL oblique PP head do Mháire to Máire
_at_PP_SUBJ prep subj pronoun D'éirigh liom, 'I succeeded' i.e. success was with me'
_at_PP_PRED Predicative Is liom é 'It is mine' i.e. Is with me it (ownership)
_at_PP_STAT stative ina rí 'is a king' i.e. 'in his king(hood)'
21
Parsing Methodology Constraint Grammar
  • Aims (Karlsson et al., 1995)
  • assign the appropriate morphological and
    syntactic information according to the context of
    each token or larger structure in the text
  • assign an analysis to every string in the input,
    bearing in mind that unrestricted text will
    contain typographical errors, non-sentential
    fragments, dialectal and colloquial material
  • if an ambiguity cannot be resolved, the
    alternative analyses are retained rather than
    forcing a (possibly incorrect) choice

22
Constraint Grammar Principles
  • Differences between CG and other parsing
    methodologies (Karlsson, 1995, p37).
  • Unlike a context-free grammar, a Constraint
    Grammar does not attempt to define the set of
    grammatical sentences in a language.
  • ... everything is licensed which is not
    explicitly ruled out
  • makes it more robust in handling unrestricted
    text
  • Does not aim to produce a minimal set of general
    rules a CG grammar can contain many specific
    lexically-specific rules to handle special cases.
  • Doesnt attempt to determine constituency
    structure.

23
CG Dependency Rules
  • MAP (_at_TAG) TARGET (POS) IF (CONDITIONS)
  • e.g.
  • MAP (_at_FMV) TARGET (Verb) IF (NOT 0 VSYNTH OR
    AUX) (NOT -1 RELPART) (NOT -2 RELPART)
  • SETS
  • LIST VSYNTH (Verb 1P) (Verb 2P) (Verb 3P) (Verb
    Auto)
  • LIST AUX ("bí") ("téigh") ("tosaigh")
    ("tosnaigh") ("féad") ("caith") ("féach")
  • LIST RELPART (Vb Rel) (Prep Rel)

24
Order of Implementation of Rules
  • Dependency Analysis is carried out in the
    following order
  • Clause Boundaries
  • Verbs and/or Copulas
  • Preposition Heads
  • All Dependent Modifiers
  • Subject
  • Predicates of Copular Constructions
  • Object(s)
  • Adverbials
  • Other

25
Example (1)
  • Fuair faighVerbVTPastIndsé
    séPronPers3PSgMascSbjleabhar
    leabharNounMascComSgins iPrepArtSgan
    anArtSgDefsiopa siopaNounMascComSg
    DefArt
  • Fuair sé leabhar ins an siopaGot he
    book in the shopV Pro N Prep
    Det N_at_FMV _at_SUBJ _at_OBJ _at_PP_ADVL _at_gtN _at_ltP He
    got a book in the shop

26
Example (1)
  • Fuair sé leabhar ins an siopa
    Got he book in the shop V
    Pro N Prep Det N _at_FMV _at_SUBJ _at_OBJ
    _at_PP_ADVL _at_gtN _at_Plt He got a book in the
    shop

27
Example (1)
  • root Fuair sé leabhar ins an siopa
    Got he book in the shop V
    Pro N Prep Det N _at_FMV _at_SUBJ _at_OBJ
    _at_PP_ADVL _at_gtN _at_ltP He got a book in the
    shop

28
Example (2)
  • Chonaic Máire an fear a bhí ag
    itheSaw Máire the man that was at
    eatingV N Det N Rel V Prep
    VN_at_FMV _at_SUBJ _at_gtN _at_SUBJ_REL _at_gtV _at_FAUX
    _at_PP_ASP _at_ltP Máire saw the man that was eating
  • ag ithePrep VN FORM_at_PP_ASP
    _at_ltP FUNCTIONeating

29
Development/Test Cycle
30
Evaluation of Dependency Analysis
  • Sample Sentences 225 short grammatical sentences

Precision (Test Suite)
  • Gold Standard Dependency Analysis Corpus
  • 250 sentences randomly selected from the 3,000
    sentence Gold Standard POS Tagged Corpus

Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences) Gold Standard Development Set (150 Sentences)
Tot Tokens Punct. Tokens Tokens Correct Incorrect F-Score
4403 444 3959 3706 253 93.60
Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences) Gold Standard Test Set (150 Sentences)
Tot Tokens Punct. Tokens Tokens Correct Incorrect F-Score
2555 282 2273 2143 130 94.28
31
Chunking
  • Using the Dependency Annotations and a Regular
    Expression Grammar (implemented using Xerox
    Finite-State Tools1) we can identify
    phrase-like structures, described by Abney (1991)
    as 'chunks'.
  • 1 For details see http//www.cis.upenn.edu/cis
    639/docs/xfst.html

32
Implementation
  • Regular expressions and Xerox FST
  • Chunks
  • NP .. , V .. etc.
  • PP with embedded NP
  • PP .. NP ..
  • Conjunction with embedded conjoint
  • CJ2 .. ?
  • NP úlla CJ2 agus NP oráistí NP
  • apples and oranges
  • Aspectual phrases
  • ASP PP-ASP .. NP .. (OA ..)
  • ASP PP-ASP ag NP dúnadh OA an dorais
  • closing the door

33
Example (3)
  • "ltTágt" "bí" Verb VI PresInd _at_FAUX Is"ltségt" "sé"
    Pron Pers 3P Sg Masc Sbj _at_SUBJ he"ltaggt" "ag"
    Prep Simp _at_PP_ASP at"ltrithgt" "rith" Verbal
    Noun VTI _at_Plt running
  • He is running
  • S V Tá bíVerbVIPresInd_at_FAUX NP
    sé séPronPers3PSgMascSbj_at_SUBJ NP ASP
    PP-ASP ag agPrepSimp_at_PP_ASP NP
    rith rithVerbalNounVTI_at_Plt NP PP-ASP
    ASP S

34
Regular Expession Chunker

  • Verb Chunk Dependency Tags

  • define VTag _at_FAUX_at_FAUX_REL_at_FMV_at_FMV
    _REL
  • define VSTag _at_FAUX_SUBJ_at_FAUX_REL_SUBJ
    _at_FMV_SUBJ_at_FMV_REL_SUBJ
  • define PreVTag _at_gtV
  • Verb Pre Post Modifiers
  • define PreVStr TokLemMTag PreVTag SP
  • Verb Chunk
  • define VStr TokLemMTag VTag SP
  • define VChunk PreVStr VStr
  • define VChunkBr VChunk _at_-gt "V " ... " "
  • Verb_Subject Chunk
  • define VSStr TokLemMTag VSTag SP
  • define VSChunk PreVStr VSStr
  • define VSChunkBr VSChunk _at_-gt "VS " ... " "

35
Example (4)
  • Tá bíVerbVIPresInd_at_FAUX Is
    mé méPronPers1PSg_at_SUBJ_ASP
    Iag agPrepSimp_at_PP_ASP atdéanamh déanamh
    VerbalNounVTI_at_Plt making cáca cácaNounMascG
    enSg_at_OBJ_ASP cake. .PunctFin .
  • I am making a cake
  • S V Tá bíVerbVIPresInd_at_FAUX V NP
    mé méPronPers1PSg_at_SBJ_ASP NP ASP
    PP-ASP ag agPrepSimp_at_PP_ASP NP
    déanamh déanamhVerbalNoun_at_Plt NP PP-ASP
    OA cáca cácaNnMscGenSg_at_OBJ_ASP OA
  • ASP . .PunctFin S

36
Corpus Data
  • Ach sin an toradh is measa a fhéadfadh tarlú don
    pháirtí agus déarfaidís leat nár cóir an iomad
    airde a thabhairt do na pobalbhreitheanna nach
    raibh riamh fabhrach do na páirtithe beaga.'But
    that is the worst possible result for the party
    and they would say to you that it is not right to
    pay too much attention to the opinion polls that
    were never favourable to small parties.

37
Dependency Analysis
  • S
  • CONJ Ach achConjSubord_at_CLB
  • COP Sin sinCopProDem_at_COP_SUBJ
  • NP an anArtSgDef_at_gtN toradh
    toradhNounMscComSgDefArt_at_PRED
  • is isPartSup_at_gtADJ
  • measa olcAdjComp_at_Nlt NP
  • VP a aPartVbRelDirect_at_CLB
  • fhéadfadh féadVerbVTICondLen_at_FAUX_RE
    L
  • INF tarlú tarlúVerbalNounVTI_at_INF INF
  • PP don doPrepArtSg_at_PP_ADVL
  • NP pháirtí páirtíNounMascComSgLen_at_P
    lt NP PP
  • CB agus agusConjCoord_at_CLB
  • VS déarfaidís abairVerbVTICond3PPl_at_
    FMVSUBJ
  • PP leat lePronPrep2PSg_at_PP_ADVL PP
  • COP nár isCopPastRelNeg_at_CLB
  • PRED cóir cóirAdjBase_at_PRED

INF an anArtSgDef_at_gtN iomad
iomadSubstNounSg_at_OBJ_INF airde
airdNounFemGenSg_at_Nlt I a
aPrepSimp_at_PP_INF thabhairt
tabhairtVerbalNounVTILen_at_PltI INF PP
do doPrepSimp_at_PP_ADVL NP na
naArtPlDef_at_gtN pobalbhreitheanna
pobalbhreithNounFemComPl_at_Plt
NPPP V nach nachPartVbNegRel_at_CLB
raibh bíVerbPastIndNegLen_at_FMV_REL
PRED riamh riamhAdvIts_at_gtADJ
fabhrach fabhrachAdjBase_at_PRED PP do
doPrepSimp_at_PP_ADVL NP na
naArtPlDef_at_gtN páirtithe
páirtíNounMascComPlDefArt_at_Plt
beaga beagAdjComNotSlenPl_at_Nlt NP PP .
PunctFin S
38
Evaluation of Chunker
  • Evalb program used to evaluate bracketing of 250
    sens. 150 Development Set
    Sentences

ALL SENTENCES SENTENCES Lenlt40
Number of sentence 150 Number of sentence 120
Bracketing Recall 96.26 Bracketing Recall 97.31
Bracketing Precision 98.15 Bracketing Precision 98.57
Bracketing F-Measure 97.20 Bracketing F-Measure 97.94
100 Test Set Sentences
ALL SENTENCES SENTENCES Lenlt40
Number of sentence 100 Number of sentence 85
Bracketing Recall 92.89 Bracketing Recall 94.09
Bracketing Precision 94.12 Bracketing Precision 94.09
Bracketing FMeasure 93.50 Bracketing FMeasure 94.09
39
Future Work
  • Partial Parsing to date as we have not addressed
  • Co-ordination
  • He packed his clothes and shoes
  • He packed his clothes and left
  • PP-attachment
  • He stabbed the man with the knife
  • He stabbed the man with the knife
  • PP-function
  • locative vs. stative
  • adjunct v.s. indirect object
  • adding additional info in the FS Lexicons, e.g.
    noun sub-classes, subcategorisation frames for
    verbs
  • Irish Text Processing Tools http//www.scss.tcd.i
    e/Elaine.UiDhonnchadha/irish.utf8.htm
Write a Comment
User Comments (0)
About PowerShow.com