Title: Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs
1Issues of Valency in Prague Dependency Treebank
Creating Valency Lexicon of Verbs
- Markéta Lopatková
- Center for Computational Linguistics
- MFF UK, Prague
CIL XVII, Prague, July 26, 2003
1
2Motivation
- traditional linguistics
- source of data for linguistic research
- verification of theoretical criteria set up
- natural language processing
- lemmatization
- morphological tagging
- syntactic analysis
- word sense disambiguation
- semantic analysis
- machine translation
- building other resources
- language acquisition
CIL XVII, Prague, July 26, 2003
2
3Syntactic vs. semantic approach I.
- Levin Verb Classes (Levin, 1993)
- hypothesis syntactic features of verbs are
semantically determined - method syntactic behavior ? semantic classes
- alternation a change in the realization of
the argument structure of a verb - conative alternation
- Edith cuts the bread ? Edith cuts at the bread
- classes verbs which undergo certain types of
alternations
CIL XVII, Prague, July 26, 2003
3
4Syntactic vs. semantic approach II.
- PropBank (Palmer et al., 2001)
- layer of semantic annotation in PennTreebank
- argument structure for verbs
- arguments Arg0, ... Arg5
- modificators ArgM (LOC, TMP, EXT, PRP, ADV)
- He was drawing diagrams and sketches for his
patron. - Arg0 he
- Rel drawing
- Arg1 diagrams and
sketches - Arg2-for his patron
- He keeps st in the fridge.
- Arg0 he
- Rel keeps
- Arg1 st
- Arg2-in the fridge
- (also Hajicová, Kucerová, 2002)
CIL XVII, Prague, July 26, 2003
4
5Syntactic vs. semantic approach III.
- FrameNet (Fillmore, 2002)
- it groups lexical items with parallel semantic
characterization - the structure and particular components
correspond to semantic roles of the common
semantic frame - verbs, nouns, adjectives, prepositions
- Communication
- Speaker Message Addressee Topic
Medium - Tom communicates with Kim
about the festival. - Tom communicates with Kim
by letter. - Tom communicates the message to me.
- Reciprocality
- Protagonists Prot-1 Prot-2
- Tom fought with Kim.
- Tom and Kim fought.
CIL XVII, Prague, July 26, 2003
5
6Syntactic vs. semantic approach IV.
- LCS Database (Lexical Conceptual Structure)
(Dorr, 2001) - semantic representation
- semantic structure semantic content
- verb cut down lexical item (act_on loc (
thing 1) ( thing 2) (( on 23) loc
(head) (thing 24)) - (cutingly
26) - (down/m))
- sentence United States cut down (the) quota.
- (act_on loc (us) (quota)
- (( on 23) loc (head) (thing 24))
- (cutingly 26)
- (down/m))
- logic arguments (ag, exp, th, src, goal, info,
perc, loc,poss, time, prop) - logic modifiers (mod-poss, ben, instr, purp,
mod-loc, manner, mod-prop) - cut down _ag_th,mod-loc(on)
CIL XVII, Prague, July 26, 2003
6
7Prague Dependency TreeBank
- based on
- Functional Generative Description (FGD) (Sgall et
al., 1986) - dependency-oriented
- stratificational
- level of underlying representation
- (tectogrammatical level) (described in
Hajicová et al., 2000) - valency theory (esp. Panevová, 1994)
-
-
-
CIL XVII, Prague, July 26, 2003
7
8Valency in FGD I.
- complementations
- inner participants vs. free modifications
- obligatory vs. optional
- valency frame
- Matka.ACT predelala detem.ADDR loutku.PAT z
Kašpárka.ORIG na certa.EFF. - Mother re-made a puppet for children from a
Punch to an imp. (Panevová) - V Praze.LOC se sejdeme na Hlavním nádraží.LOC u
pokladen.LOC. (Panevová) - In Prague we will meet at Main Station near a
booking-office.
obligatory optional
inner participants
free modifications
CIL XVII, Prague, July 26, 2003
8
9Valency in FGD II.
- a middle position
- syntactic criteria are used for the
identification of Actor and Patient (Actor is the
first inner participant, the second is always a
Patient) - other inner participants (Addressee, Origin and
Effect) as well as free modifications are
determined in accordance with semantic
considerations - concept of shifting (Panevová, 1974-75)
- Origin
- Actor Patient
Addressee - Effect
- Kniha.ACT vyšla. (Panevová) The book appears.
- Chlapec.ACT vyrostl v muže.PAT. (Panevová) A
boy grew up to a man.
CIL XVII, Prague, July 26, 2003
9
10Valency in FGD III.
- valency of autosemantic words
- verbs (Panevová, from the seventies)
- 5 inner participants - Actor, Patient, Addressee,
Origin, Effect - app. 45 free modifications
- shifting of cognitive roles for inner
participants - nouns (esp. Panevová, 2000, Reznícková,
manuscript) - verbal complementations
- spec. nominal complementations - Identity,
Partitive, Appurtenance, Restrictive and
Descriptive Attribute - adjectives (Panevová, 1998)
- verbal complementations
- spec. adjectival complementations
CIL XVII, Prague, July 26, 2003 10
11Valency structure on TR level of PDT
- the core of annotation on the tectogrammatical
level - problem of consistency ? valency lexicon
- verbs
- two branches
- lists of verbs with their complementations being
created and used by annotators (PDT-VALLEX) - complex valency lexicon (VALLEX)
- nouns
- the theoretical aspects and methodology are
refined now (Reznícková, manuscript) - lists of nouns with their complementations
- adjectives
- lists of adjectives with their complementations
CIL XVII, Prague, July 26, 2003 11
12Valency lexicon of verbs PDT-VALLEX
- lists being created and used by annotators
- valency frames of verbs in their particular
meanings, as they appear during annotation, the
lexeme as a whole is not analyzed - the information specifying elements of frames
- functor - i.e. name of complementation
- type - obligatory / optional
- possible morphemic form(s)
- example(s)
- it serves for consistency of annotation
- approx. 4 700 verbs with 7 150 valency frames
(i.e. 1,5 frames per verb) - dát to give ... ACT(1obl) ADDR(3obl)
PAT(4obl) - dát nekomu knihu to give sb a book
CIL XVII, Prague, July 26, 2003 12
13Valency lexicon of verbs VALLEX
- complex information on the whole verb lexeme in
all its meanings (Lopatková, Žabokrtský, 2002) - the information on particular valency frames,
corresponding to its meanings (described with
gloss(es) and example(s)) - the information specifying elements of frames
- functor - i.e. name of complementation
- type - obligatory / optional
- possible morphemic form(s)
- mluvit to speak
- ... ACT(1obl) ADDR(s7obl)
PAT(o6opt) - mluvila s ním o detech she spoke with him about
their children - additional syntactic information
CIL XVII, Prague, July 26, 2003 13
14Valency lexicon of verbs VALLEX II.
- additional syntactic information for particular
valency frames - reflexivity (in progress)
- reciprocity
- control
- aspect and aspectual counterparts
- possible diatheses, passivization (future plans)
- primary / secondary / idiomatic usage
- syntactic/semantic class (in progress)
- pointers to Czech EuroWordNet (in progress)
- frequency of a particular frame in samples of CNK
(60 occurrences of each verb lexeme)
CIL XVII, Prague, July 26, 2003 14
15Valency lexicon of verbs VALLEX III.
- current state
- 1 400 verbs with 3 860 frames (i.e. 2,7 frames
per verb) - verbs chosen according to their frequency in
Czech National Corpus and PDT - about 85 on running text in PDT
-
- open questions
- enriched valency frame
- syntactic-semantic classes
- alternative frames
- frozen collocations
CIL XVII, Prague, July 26, 2003 15
16Valency lexicon of verbs
- Why two branches?
- PDT-VALLEX extensive
- necessary for annotation
- recall improves relatively quickly
- VALLEX intensive
- the whole lexeme is analyzed en bloc ? adequate
and consistent description - precision improves
- the two branches are supposed to be merged
- PDT-VALLEX valuable source for VALLEX
CIL XVII, Prague, July 26, 2003 16
17(No Transcript)
18Enriched valency frames I.
- inner participants
- each inner participant can occur only once
- (with single occurrence of a verb)
- combination of inner participants must be listed
for a particular verb - morphemic form is predicted by the governing verb
- concept of shifting is applied
- free modifications
- each free modification can be repeated
- syntactically, they can modify any verb
- (only semantic restrictions are often present)
- they have typical semantics
- they do not undergo the shifting
CIL XVII, Prague, July 26, 2003 18
19Enriched valency frames II.
- quasi-valency complementations (also Panevová,
2003) - each quasi-valency complementation can occur only
once - (with any occurrence of a verb)
- each quasi-valency complementation is
characteristic for a limited list of verbs - morphemic form is predicted by the governing verb
- they have typical semantics
- they do not undergo the shifting
- Obstacle uhodit hlavou o vetev.OBST to bump
one's head against a bough - zavadit o stul.OBST to brush against a
table - Difference prodloužit o hodinu.DIFF to prolong
by one hour - Mediator vzít nekoho za ruku.MDT to take sb
by his/her hand
CIL XVII, Prague, July 26, 2003 19
20Enriched valency frames III.
- typical modifications
- optional free modifications commonly used with
a verb - usually modify group of verbs with similar
meaning - morphemic form
- prototypical for some modifications
- e.g. Dative case or prep. group pro forAcc for
Benefactor - determined by the typical semantics of the
modifying members - e.g. prep. groups na onLoc and v inLoc
typically specify Location - verbs of motion typically modified by
Direction modification (provided that
Direction is not obligatory) - jít do kina / pres les / jít z domova
- to go to cinema / through the wood / from home
- verbs of exchange typically modified by
modification of Recompense - dát / dostat / získat / kupovat / brát neco.PAT
za neco.RCMP - to give / get / obtain / buy / take something
for something
CIL XVII, Prague, July 26, 2003 20
21Exploitation of the valency lexicon
- reaching the consistency of assigning the valency
structure (PDT-VALLEX) - automatic syntactic analysis (shallow parsing)
- tectogrammatical parser
- automatic system for creating an underlying
representation of Czech sentences - source data for building the valency lexicon of
nouns
CIL XVII, Prague, July 26, 2003 21
22Resources
- theoretical articles on valency (Panevová)
- The Manual for Tectogrammatical Tagging of the
Prague Dependency Treebank (Hajicová et al.,
2000) - lists of particular valency frames created by
annotators - electronic valency dictionary of surface
realizations of verbal modifiers - (FI MU Brno, Pala, Ševecek, 1997)
- printed dictionaries
- Slovesa pro praxi (SPP, 1997), valency
specification of 767 most frequent - verbs
- Slovník spisovného jazyka ceského (SSJC, 1964)
- Slovník spisovné ceštiny pro školu a verejnost
(SSC, 1978) - Slovník ceských synonym (SCS, 1994)
- Slovník ceské frazeologie a idiomatiky (SCFI,
1983) - Czech National Corpus (CNK)
- EuroWordNet, Czech WordNet
CIL XVII, Prague, July 26, 2003 22
23References I.
- Dorr, B.J. (2001) LCS Verb Database, Online
Software Database of Conceptual Structures and
Documentations, UCMP . - Fillmore, Ch. (2002) FrameNet and the Linking
between Semantic and Syntactic Relations. - In COLING 2002, Proceedings, pp. xxviii-xxxvi.
- Hajicová, E. et al. (2000) A Manual for
Tectogrammatical Tagging of the Prague Dependency
Treebank. UFAL/CKL Technical Report TR-2000-09. - Hajicová, E., Kucerová, I. (2002)
Argument/Valency Structure in PropBank, LCS
Database and Prague Dependency Treebank. In LREC
2002, Proceedings, pp. 846-851. - Levin, B. (1993) English Verb Classes and
Alternations A Preliminary Investigation.
Chicago University of Chicago. - Lopatková, M. et al. (2002) Tektogramaticky
anotovaný valencní slovník ceských sloves.
UFAL/CKL Technical Report TR-2002-15. - Lopatková, M., Žabokrtský, Z. (2002) Valency
Dictionary of Czech Verbs. In LREC 2002,
Proceedings, pp. 949-956. - Lopatková, M. (2003) Valency in the Prague
Dependency Treebank Building the Valency
Lexicon. PBML 79. (in press)
CIL XVII, Prague, July 26, 2003 23
24References II.
- Pala, K., Ševecek, P. (1997) Valence ceských
sloves. In Sborník prací FFUB, Brno. - Palmer, M. et al. (2001) Automatic Predicate
Argument Analysis of the Penn TreeBank. In HLT
2001, Proceedings, San Francisco Morgan Kaufamm. - Panevová, J. (1974-75) On Verbal Frames in
Functional Generative Description. Part I, PBML
22, pp. 3-40, Part II, PBML 23, pp. 17-52. - Panevová, J. (1994) Valency Frames and the
Meaning of the Sentence. In Luelsdorff (ed.) The
Prague School of Structural and Functional
Linguistics, John Benjamins, pp. 223-243. - Panevová, J. (1998) Ješte k teorii valence. Slovo
a slovesnost 59, pp. 1-14. - Panevová, J. (2000) Poznámky k valenci
podstatných jmen. Ceština - univerzália a
specifika 2, Masarykova Univerzita, Brno, pp.
173-180. - Panevová, J. (2003) Some Issues of Syntax and
Semantics of Verbal Modifications. In
Proceedings of MTT 2003, Paris. (in press) - Sgall, P. et al. (1986) The Meaning of the
Sentence in Its Semantic and Pragmatic Aspects.
Dordrecht Reidel, Prague Academia.
CIL XVII, Prague, July 26, 2003 24