Prague Arabic Dependency Treebank - PowerPoint PPT Presentation

About This Presentation
Title:

Prague Arabic Dependency Treebank

Description:

Project Release PADT 1.0. December 2004, Linguistic Data Consortium ... (adverbial, locative) Verbal. Verb-like behavior (object of noun?) September 23, 2004 ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 15
Provided by: q382
Category:

less

Transcript and Presenter's Notes

Title: Prague Arabic Dependency Treebank


1
Prague Arabic DependencyTreebank
Development in Data and Tools
Jan Hajic Otakar SmržPetr ZemánekJan
ŠnaidaufEmanuel Beška
Faculty of Mathematics and Physics Faculty of
Philosophy and Arts Charles University in Prague
2
Project Release PADT 1.0
  • December 2004, Linguistic Data Consortium
  • 148 000 Morpho, 113 500 Syntax

AFP 13 000 N/A France Presse Penn ATB 1
UMH 38 500 N/A Ummah Press Penn ATB 2
XIN 13 500 N/A Xinhua News A Gigaword
ALH 10 000 73 500 Al-Hayat News A Gigaword
ANN 12 500 25 500 An-Nahar News A Gigaword
XIA 26 500 49 500 Xinhua News A Gigaword
3
Open-Source Tools
  • TrEd Tree Editor
  • Multi-purpose annotation environment
  • Suite of programming utilities
  • Netgraph Search Engine
  • Server/Client system architecture
  • Easy-to-learn query language
  • EncodeArabic Perl Module
  • Extension for processing of Arabic script
  • ArabTeX, Buckwalter, Unicode,

4
PADT Functional Views
  • Functional Generative Description
  • Theory of linguistic meaning and its expression
  • Prague Dependency Treebank for Czech
  • Independence of representation levels
  • Tectogrammatical linguistic meaning
  • Analytical surface dependency syntax
  • Morphological categories and lexical units
  • Abstraction of the relations across levels
  • Strict distinction between form and function
  • Different units of description on each level

5
Functional Morphology
  • Provides syntax levels with their abstract
    language, not just giving letters in tokens
  • Revives multiple senses of categories
  • Completeness of generation
  • Strict modeling of grammatical control
  • MorphoTrees human tagging
  • Successful prototype feature-based tagger

6
Syntactic Levels of Description
  • Analytical level
  • Pragmatically motivated, close to surface syntax
  • Every single token resulting frommorphological
    level forms one node
  • Tree-like dependency structure for every sentence
  • Tectogrammatical level
  • Linguistic (literal) meaning, deep relations, TFA
  • Initial structures transformed from AL
  • Nodes for autosemantic words only
  • Decisive role of valency frames

7
Logic of Analytical Trees
  • Concepts of dependency and valency
  • Reduction sentence must retain grammatical
    correctness if leaves(terminal nodes) are
    chopped off
  • Trees clause components ? clauses ? sentences ?
    paragraphs etc.Subtrees of clauses exchangeable
    for non-clauses
  • Nodes words, tokenized parts of words,
    punctuation marks marked by functions
  • Edges syntactic relations governing node ?
    dependent node/subtree

8
Some Syntax Issues of Arabic
  • Non-verbal predication of several types
  • Subordinate non-verbal clauses / modification
  • Verb-like behavior of many nominal forms
  • Mostly VSO in verbal sentences, but
  • vice-versa in non-verbal clauses
  • different, depending on context boundness
  • Compound verbs, fixed composite prepositions
  • Grammatical co-reference, accusative ofinner
    object, complex referencing, etc.

9
Problem I Predication
  • Head node of tree PREDICATE
  • Why? Steady role in sentence, cannot be omitted
  • Verbal predicate I-go to school
  • Non-verbal predicate
  • Nominal The-house a-big (the house is big)
  • Existential There a-city (there is a city)
  • Prepositional
  • Possessive For him a-house (he has a house)
  • Adverbial The-mosque in the-city (is)
  • Conjunctional The-problem that (is that)

10
Predication Types in Trees
Verbal
Nominal
dAma Pred lasted
kabIrun Pnom a-big nom.
iqtirAHu Sb proposal
sAEatayni Adv two-hours acc.
Prepositional(possessive)
al-baytu Sb the-house nom.
Existential
-hu Atr his
al-EamalIyata Obj the-operation acc.
EalA AuxP on
vamata PredE there-is
zumalAi Obj colleagues
Prepositional(adverbial, locative)
la- PredP for
madInatun Sb a-city nom.
-hi Atr his
Verb-like behavior (object of noun?)
fI PredP in
-hu Obj him
baytun Sb a-house nom.
al-madInati Adv the-city gen.
al-jAmiEu Sb the-mosque nom.
11
Problem II Clauses Co-reference
  • Recursiveness subordinate clause is con-tained
    as subtree in place of simple element
  • Head-node of clause gets the same function
  • Problem non-verbal structures clauses or not?
  • Compound verbs (mA zAla etc.) treated equally
  • Grammatical co-reference Personal pro- noun
    formally required by another element
  • Pronoun must be marked to be treated as such
  • Target of reference is unambiguously identifiable
  • Often in subordinate clauses, mostly
    attributiveEx. He-wrote a-book number its-pages
    hundred

12
Clauses Co-reference in Trees
Compound verb, formed as main verb and its
complement
Attributive clause, prepositional predicate
(adverbial)
zAlat Pred she-stopped
kataba Pred he-wrote
kitAban Obj a-book
mA AuxM not
Objective clause, verbal predicate
tuHisu Atv she-feels
al-rajulu Sb the-man nom.
fI Atr_PredP in
zaybabu Sb Zaynab
Attributive clause, nominal predicate
miatu Sb hundred nom.
Referencing pronoun, as attribute in clause
anna AuxC that
-hi Adv_Ref it
tuEjibu Obj_Pred they-impress
SafHatin Atr pages gen.
jumalan Sb sentences acc.
Referencing pronoun, as adverbial in clause
wADiHun Atr_Pnom clear nom.
naHwu Sb grammar nom.
-hA Obj her
-hA Atr_Ref their
13
Future Prospects
  • Implementation of Functional Morphology
  • Tectogrammatical annotation
  • Lexicons of valency frames
  • Re-training the feature-based tagger on
    MorphoTrees
  • Machine-learning on the treebank data for various
    purposes

14
Thank you
  • Questions welcome!
  • http//ckl.mff.cuni.cz/padt/
Write a Comment
User Comments (0)
About PowerShow.com