Valency Dictionary of Czech Verbs: Complex Tectogrammatical Annotation - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Valency Dictionary of Czech Verbs: Complex Tectogrammatical Annotation

Description:

Center for Computational Linguistics, Charles University, Prague ... type of usage (prim./second./idiom.) semantic class (verba dicendi etc. ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 2
Provided by: zdeneka
Category:

less

Transcript and Presenter's Notes

Title: Valency Dictionary of Czech Verbs: Complex Tectogrammatical Annotation


1
Valency Dictionary of Czech VerbsComplex
Tectogrammatical Annotation
Markéta Stranáková-Lopatková Zdenek
abokrtský Center for Computational Linguistics,
Charles University, Praguestranak,zabokrtsky_at_ck
l.mff.cuni.cz
Verbal valency frame
Motivation
  • a range of syntactic elements (verbal modifiers)
    either required or specifically permitted by the
    verb

There is no wide-coverage valency lexicon
containing functors ("thematic roles"). Indeed,
there is no Czech lexicon where valency phenomena
are treated in a sufficiently systematic way!

EFF
Theoretical background
ADDR
ACT
  • based on Functional Generative Description
    Sgall et al.,1986
  • closely related to the tectogrammatical tree
    structures of the Prague Dependency Treebank
    Hajicová et al., 2001
  • 38

Goals
  • to develop an annotation scheme, methodology and
    software tools for a tectogrammatically annotated
    valency dictionary of Czech verbs
  • to verify the approach on a small set of verbs
  • to keep maximal consistency for all captured
    phenomena
  • emphasis on both human and machine readability
    of the output

PAT
vymenit
(to exchange)
Example of the TG-structure "... od prodejcu
vybírá poplatky kadé ráno správce trite."
("... the janitor of a market-place collects fees
from the sellers every morning. ")
What should the dictionary ideally capture?
  • for each verb
  • set of valency frames
  • for each valency frame
  • ordered sequence of frame slots
  • synonyms
  • examples of usage
  • reciprocity
  • control (equi/raising verbs)
  • possible type of reflexivity
  • type of passivisation
  • lemma of aspectual correlate(s)
  • type of usage (prim./second./idiom.)
  • semantic class (verba dicendi etc.)
  • pointer(s) to EuroWordNet synset(s)
  • number of occurences in a text sample
  • for each frame slot
  • functor

Processing steps
previously existing lexical data resources
automatic preprocessing
manual annotation
manual automatic consistency checking
automatic postprocessing
resulting valency lexicon (XML)
Current State
Sample from the dictionary
predpokládat to presuppose,to assume,to
demand -aspect (imp.) ACT(1obl)
PAT(o6opt) EFF(4,eobl) -synon brát za
dané -example predpokládal o tom, e je to
pravda predpokládali o sobe (navzájem), e
nelou he presupposed about it that it was
true they assumed (one about the other) that
they didnt tell lies -reciprocity
ACT-PAT -use prim -class dicendi
-ewn 1 -freq 27 ACT(1,inf,e,abyobl)
PAT(4,eobl) -synon ádat -example
tato práce predpokládá jistou zrucnost
pracovat zde predpokládá zrucnost this work
demands certain skill it demands skill to work
here -control gen -use secondary
-ewn 2 -freq 3
  • 1000 verbs being annotated (350 finished)
  • circa 60 coverage on verbs in running text

dodat to deliver,to supply,to add -aspect
(imp) dodávat (perf) ACT(1obl) PAT(4obl)
DIR3(obl) BEN(3,pro4typ) -synon dopravit
-example dodat (nekomu/pro nekoho) nekam zboí
to deliver goods somewhere
-note alter -use prim -freq 1
ACT(1obl) ADDR(3obl) PAT(4obl) DIR3(typ)
-synon dopravit -example dodat nekomu zboí
(nekam) dodali si (jeden druhému) zboí (nekam)
to deliver goods to somebody they
delivered goods to each other -note alter
-use prim -reciprocity ACT-ADDR
-freq 1 ACT(1obl) PAT(k3opt) EFF(4,eobl)
-synon ríci -example dodal k tomu své
pripomínky /ve, co vedel
to add a remark on something -use
secondary -class dicendi -freq 18
ACT(1obl) ADDR(3obl) PAT(4,2obl) -example
dodat nekomu odvahu/odvahy dodali si odvahy
(jeden druhému) to encourage
somebody -reciprocity ACT-ADDR -use
idiom
Conclusion
  • we can build an interesting and important
    language resource, but ...
  • ...creating high-quality data requires a lot of
    human effort the task cannot be fully automated

Future work
  • finding hard-and-fast criteria for annotators
  • decisions
  • enlarging the current lexicon
  • intensive use of different language resources
  • linking to other languages
Write a Comment
User Comments (0)
About PowerShow.com