Title: Prague Arabic Dependency Treebank
1Prague Arabic DependencyTreebank
Development in Data and Tools
Jan Hajic Otakar SmržPetr ZemánekJan
ŠnaidaufEmanuel Beška
Faculty of Mathematics and Physics Faculty of
Philosophy and Arts Charles University in Prague
2Project Release PADT 1.0
- December 2004, Linguistic Data Consortium
- 148 000 Morpho, 113 500 Syntax
AFP 13 000 N/A France Presse Penn ATB 1
UMH 38 500 N/A Ummah Press Penn ATB 2
XIN 13 500 N/A Xinhua News A Gigaword
ALH 10 000 73 500 Al-Hayat News A Gigaword
ANN 12 500 25 500 An-Nahar News A Gigaword
XIA 26 500 49 500 Xinhua News A Gigaword
3Open-Source Tools
- TrEd Tree Editor
- Multi-purpose annotation environment
- Suite of programming utilities
- Netgraph Search Engine
- Server/Client system architecture
- Easy-to-learn query language
- EncodeArabic Perl Module
- Extension for processing of Arabic script
- ArabTeX, Buckwalter, Unicode,
4PADT Functional Views
- Functional Generative Description
- Theory of linguistic meaning and its expression
- Prague Dependency Treebank for Czech
- Independence of representation levels
- Tectogrammatical linguistic meaning
- Analytical surface dependency syntax
- Morphological categories and lexical units
- Abstraction of the relations across levels
- Strict distinction between form and function
- Different units of description on each level
5Functional Morphology
- Provides syntax levels with their abstract
language, not just giving letters in tokens - Revives multiple senses of categories
- Completeness of generation
- Strict modeling of grammatical control
- MorphoTrees human tagging
- Successful prototype feature-based tagger
6Syntactic Levels of Description
- Analytical level
- Pragmatically motivated, close to surface syntax
- Every single token resulting frommorphological
level forms one node - Tree-like dependency structure for every sentence
- Tectogrammatical level
- Linguistic (literal) meaning, deep relations, TFA
- Initial structures transformed from AL
- Nodes for autosemantic words only
- Decisive role of valency frames
7Logic of Analytical Trees
- Concepts of dependency and valency
- Reduction sentence must retain grammatical
correctness if leaves(terminal nodes) are
chopped off
- Trees clause components ? clauses ? sentences ?
paragraphs etc.Subtrees of clauses exchangeable
for non-clauses - Nodes words, tokenized parts of words,
punctuation marks marked by functions - Edges syntactic relations governing node ?
dependent node/subtree
8Some Syntax Issues of Arabic
- Non-verbal predication of several types
- Subordinate non-verbal clauses / modification
- Verb-like behavior of many nominal forms
- Mostly VSO in verbal sentences, but
- vice-versa in non-verbal clauses
- different, depending on context boundness
- Compound verbs, fixed composite prepositions
- Grammatical co-reference, accusative ofinner
object, complex referencing, etc.
9Problem I Predication
- Head node of tree PREDICATE
- Why? Steady role in sentence, cannot be omitted
- Verbal predicate I-go to school
- Non-verbal predicate
- Nominal The-house a-big (the house is big)
- Existential There a-city (there is a city)
- Prepositional
- Possessive For him a-house (he has a house)
- Adverbial The-mosque in the-city (is)
- Conjunctional The-problem that (is that)
10Predication Types in Trees
Verbal
Nominal
dAma Pred lasted
kabIrun Pnom a-big nom.
iqtirAHu Sb proposal
sAEatayni Adv two-hours acc.
Prepositional(possessive)
al-baytu Sb the-house nom.
Existential
-hu Atr his
al-EamalIyata Obj the-operation acc.
EalA AuxP on
vamata PredE there-is
zumalAi Obj colleagues
Prepositional(adverbial, locative)
la- PredP for
madInatun Sb a-city nom.
-hi Atr his
Verb-like behavior (object of noun?)
fI PredP in
-hu Obj him
baytun Sb a-house nom.
al-madInati Adv the-city gen.
al-jAmiEu Sb the-mosque nom.
11Problem II Clauses Co-reference
- Recursiveness subordinate clause is con-tained
as subtree in place of simple element - Head-node of clause gets the same function
- Problem non-verbal structures clauses or not?
- Compound verbs (mA zAla etc.) treated equally
- Grammatical co-reference Personal pro- noun
formally required by another element - Pronoun must be marked to be treated as such
- Target of reference is unambiguously identifiable
- Often in subordinate clauses, mostly
attributiveEx. He-wrote a-book number its-pages
hundred
12Clauses Co-reference in Trees
Compound verb, formed as main verb and its
complement
Attributive clause, prepositional predicate
(adverbial)
zAlat Pred she-stopped
kataba Pred he-wrote
kitAban Obj a-book
mA AuxM not
Objective clause, verbal predicate
tuHisu Atv she-feels
al-rajulu Sb the-man nom.
fI Atr_PredP in
zaybabu Sb Zaynab
Attributive clause, nominal predicate
miatu Sb hundred nom.
Referencing pronoun, as attribute in clause
anna AuxC that
-hi Adv_Ref it
tuEjibu Obj_Pred they-impress
SafHatin Atr pages gen.
jumalan Sb sentences acc.
Referencing pronoun, as adverbial in clause
wADiHun Atr_Pnom clear nom.
naHwu Sb grammar nom.
-hA Obj her
-hA Atr_Ref their
13Future Prospects
- Implementation of Functional Morphology
- Tectogrammatical annotation
- Lexicons of valency frames
- Re-training the feature-based tagger on
MorphoTrees - Machine-learning on the treebank data for various
purposes
14Thank you
- Questions welcome!
- http//ckl.mff.cuni.cz/padt/