Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs

About This Presentation
Title:

Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs

Description:

Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs ... Identity, Partitive, Appurtenance, Restrictive and Descriptive Attribute ... –

Number of Views:834
Avg rating:3.0/5.0
Slides: 25
Provided by: Ms67
Category:

less

Transcript and Presenter's Notes

Title: Issues of Valency in Prague Dependency Treebank: Creating Valency Lexicon of Verbs


1
Issues of Valency in Prague Dependency Treebank
Creating Valency Lexicon of Verbs
  • Markéta Lopatková
  • Center for Computational Linguistics
  • MFF UK, Prague

CIL XVII, Prague, July 26, 2003
1
2
Motivation
  • traditional linguistics
  • source of data for linguistic research
  • verification of theoretical criteria set up
  • natural language processing
  • lemmatization
  • morphological tagging
  • syntactic analysis
  • word sense disambiguation
  • semantic analysis
  • machine translation
  • building other resources
  • language acquisition

CIL XVII, Prague, July 26, 2003
2
3
Syntactic vs. semantic approach I.
  • Levin Verb Classes (Levin, 1993)
  • hypothesis syntactic features of verbs are
    semantically determined
  • method syntactic behavior ? semantic classes
  • alternation a change in the realization of
    the argument structure of a verb
  • conative alternation
  • Edith cuts the bread ? Edith cuts at the bread
  • classes verbs which undergo certain types of
    alternations

CIL XVII, Prague, July 26, 2003
3
4
Syntactic vs. semantic approach II.
  • PropBank (Palmer et al., 2001)
  • layer of semantic annotation in PennTreebank
  • argument structure for verbs
  • arguments Arg0, ... Arg5
  • modificators ArgM (LOC, TMP, EXT, PRP, ADV)
  • He was drawing diagrams and sketches for his
    patron.
  • Arg0 he
  • Rel drawing
  • Arg1 diagrams and
    sketches
  • Arg2-for his patron
  • He keeps st in the fridge.
  • Arg0 he
  • Rel keeps
  • Arg1 st
  • Arg2-in the fridge
  • (also Hajicová, Kucerová, 2002)

CIL XVII, Prague, July 26, 2003
4
5
Syntactic vs. semantic approach III.
  • FrameNet (Fillmore, 2002)
  • it groups lexical items with parallel semantic
    characterization
  • the structure and particular components
    correspond to semantic roles of the common
    semantic frame
  • verbs, nouns, adjectives, prepositions
  • Communication
  • Speaker Message Addressee Topic
    Medium
  • Tom communicates with Kim
    about the festival.
  • Tom communicates with Kim
    by letter.
  • Tom communicates the message to me.
  • Reciprocality
  • Protagonists Prot-1 Prot-2
  • Tom fought with Kim.
  • Tom and Kim fought.

CIL XVII, Prague, July 26, 2003
5
6
Syntactic vs. semantic approach IV.
  • LCS Database (Lexical Conceptual Structure)
    (Dorr, 2001)
  • semantic representation
  • semantic structure semantic content
  • verb cut down lexical item (act_on loc (
    thing 1) ( thing 2) (( on 23) loc
    (head) (thing 24))
  • (cutingly
    26)
  • (down/m))
  • sentence United States cut down (the) quota.
  • (act_on loc (us) (quota)
  • (( on 23) loc (head) (thing 24))
  • (cutingly 26)
  • (down/m))
  • logic arguments (ag, exp, th, src, goal, info,
    perc, loc,poss, time, prop)
  • logic modifiers (mod-poss, ben, instr, purp,
    mod-loc, manner, mod-prop)
  • cut down _ag_th,mod-loc(on)

CIL XVII, Prague, July 26, 2003
6
7
Prague Dependency TreeBank
  • based on
  • Functional Generative Description (FGD) (Sgall et
    al., 1986)
  • dependency-oriented
  • stratificational
  • level of underlying representation
  • (tectogrammatical level) (described in
    Hajicová et al., 2000)
  • valency theory (esp. Panevová, 1994)

CIL XVII, Prague, July 26, 2003
7
8
Valency in FGD I.
  • complementations
  • inner participants vs. free modifications
  • obligatory vs. optional
  • valency frame
  • Matka.ACT predelala detem.ADDR loutku.PAT z
    Kašpárka.ORIG na certa.EFF.
  • Mother re-made a puppet for children from a
    Punch to an imp. (Panevová)
  • V Praze.LOC se sejdeme na Hlavním nádraží.LOC u
    pokladen.LOC. (Panevová)
  • In Prague we will meet at Main Station near a
    booking-office.

obligatory optional
inner participants
free modifications
CIL XVII, Prague, July 26, 2003
8
9
Valency in FGD II.
  • a middle position
  • syntactic criteria are used for the
    identification of Actor and Patient (Actor is the
    first inner participant, the second is always a
    Patient)
  • other inner participants (Addressee, Origin and
    Effect) as well as free modifications are
    determined in accordance with semantic
    considerations
  • concept of shifting (Panevová, 1974-75)
  • Origin
  • Actor Patient
    Addressee
  • Effect
  • Kniha.ACT vyšla. (Panevová) The book appears.
  • Chlapec.ACT vyrostl v muže.PAT. (Panevová) A
    boy grew up to a man.

CIL XVII, Prague, July 26, 2003
9
10
Valency in FGD III.
  • valency of autosemantic words
  • verbs (Panevová, from the seventies)
  • 5 inner participants - Actor, Patient, Addressee,
    Origin, Effect
  • app. 45 free modifications
  • shifting of cognitive roles for inner
    participants
  • nouns (esp. Panevová, 2000, Reznícková,
    manuscript)
  • verbal complementations
  • spec. nominal complementations - Identity,
    Partitive, Appurtenance, Restrictive and
    Descriptive Attribute
  • adjectives (Panevová, 1998)
  • verbal complementations
  • spec. adjectival complementations

CIL XVII, Prague, July 26, 2003 10
11
Valency structure on TR level of PDT
  • the core of annotation on the tectogrammatical
    level
  • problem of consistency ? valency lexicon
  • verbs
  • two branches
  • lists of verbs with their complementations being
    created and used by annotators (PDT-VALLEX)
  • complex valency lexicon (VALLEX)
  • nouns
  • the theoretical aspects and methodology are
    refined now (Reznícková, manuscript)
  • lists of nouns with their complementations
  • adjectives
  • lists of adjectives with their complementations

CIL XVII, Prague, July 26, 2003 11
12
Valency lexicon of verbs PDT-VALLEX
  • lists being created and used by annotators
  • valency frames of verbs in their particular
    meanings, as they appear during annotation, the
    lexeme as a whole is not analyzed
  • the information specifying elements of frames
  • functor - i.e. name of complementation
  • type - obligatory / optional
  • possible morphemic form(s)
  • example(s)
  • it serves for consistency of annotation
  • approx. 4 700 verbs with 7 150 valency frames
    (i.e. 1,5 frames per verb)
  • dát to give ... ACT(1obl) ADDR(3obl)
    PAT(4obl)
  • dát nekomu knihu to give sb a book

CIL XVII, Prague, July 26, 2003 12
13
Valency lexicon of verbs VALLEX
  • complex information on the whole verb lexeme in
    all its meanings (Lopatková, Žabokrtský, 2002)
  • the information on particular valency frames,
    corresponding to its meanings (described with
    gloss(es) and example(s))
  • the information specifying elements of frames
  • functor - i.e. name of complementation
  • type - obligatory / optional
  • possible morphemic form(s)
  • mluvit to speak
  • ... ACT(1obl) ADDR(s7obl)
    PAT(o6opt)
  • mluvila s ním o detech she spoke with him about
    their children
  • additional syntactic information

CIL XVII, Prague, July 26, 2003 13
14
Valency lexicon of verbs VALLEX II.
  • additional syntactic information for particular
    valency frames
  • reflexivity (in progress)
  • reciprocity
  • control
  • aspect and aspectual counterparts
  • possible diatheses, passivization (future plans)
  • primary / secondary / idiomatic usage
  • syntactic/semantic class (in progress)
  • pointers to Czech EuroWordNet (in progress)
  • frequency of a particular frame in samples of CNK
    (60 occurrences of each verb lexeme)

CIL XVII, Prague, July 26, 2003 14
15
Valency lexicon of verbs VALLEX III.
  • current state
  • 1 400 verbs with 3 860 frames (i.e. 2,7 frames
    per verb)
  • verbs chosen according to their frequency in
    Czech National Corpus and PDT
  • about 85 on running text in PDT
  • open questions
  • enriched valency frame
  • syntactic-semantic classes
  • alternative frames
  • frozen collocations

CIL XVII, Prague, July 26, 2003 15
16
Valency lexicon of verbs
  • Why two branches?
  • PDT-VALLEX extensive
  • necessary for annotation
  • recall improves relatively quickly
  • VALLEX intensive
  • the whole lexeme is analyzed en bloc ? adequate
    and consistent description
  • precision improves
  • the two branches are supposed to be merged
  • PDT-VALLEX valuable source for VALLEX

CIL XVII, Prague, July 26, 2003 16
17
(No Transcript)
18
Enriched valency frames I.
  • inner participants
  • each inner participant can occur only once
  • (with single occurrence of a verb)
  • combination of inner participants must be listed
    for a particular verb
  • morphemic form is predicted by the governing verb
  • concept of shifting is applied
  • free modifications
  • each free modification can be repeated
  • syntactically, they can modify any verb
  • (only semantic restrictions are often present)
  • they have typical semantics
  • they do not undergo the shifting

CIL XVII, Prague, July 26, 2003 18
19
Enriched valency frames II.
  • quasi-valency complementations (also Panevová,
    2003)
  • each quasi-valency complementation can occur only
    once
  • (with any occurrence of a verb)
  • each quasi-valency complementation is
    characteristic for a limited list of verbs
  • morphemic form is predicted by the governing verb
  • they have typical semantics
  • they do not undergo the shifting
  • Obstacle uhodit hlavou o vetev.OBST to bump
    one's head against a bough
  • zavadit o stul.OBST to brush against a
    table
  • Difference prodloužit o hodinu.DIFF to prolong
    by one hour
  • Mediator vzít nekoho za ruku.MDT to take sb
    by his/her hand

CIL XVII, Prague, July 26, 2003 19
20
Enriched valency frames III.
  • typical modifications
  • optional free modifications commonly used with
    a verb
  • usually modify group of verbs with similar
    meaning
  • morphemic form
  • prototypical for some modifications
  • e.g. Dative case or prep. group pro forAcc for
    Benefactor
  • determined by the typical semantics of the
    modifying members
  • e.g. prep. groups na onLoc and v inLoc
    typically specify Location
  • verbs of motion typically modified by
    Direction modification (provided that
    Direction is not obligatory)
  • jít do kina / pres les / jít z domova
  • to go to cinema / through the wood / from home
  • verbs of exchange typically modified by
    modification of Recompense
  • dát / dostat / získat / kupovat / brát neco.PAT
    za neco.RCMP
  • to give / get / obtain / buy / take something
    for something

CIL XVII, Prague, July 26, 2003 20
21
Exploitation of the valency lexicon
  • reaching the consistency of assigning the valency
    structure (PDT-VALLEX)
  • automatic syntactic analysis (shallow parsing)
  • tectogrammatical parser
  • automatic system for creating an underlying
    representation of Czech sentences
  • source data for building the valency lexicon of
    nouns

CIL XVII, Prague, July 26, 2003 21
22
Resources
  • theoretical articles on valency (Panevová)
  • The Manual for Tectogrammatical Tagging of the
    Prague Dependency Treebank (Hajicová et al.,
    2000)
  • lists of particular valency frames created by
    annotators
  • electronic valency dictionary of surface
    realizations of verbal modifiers
  • (FI MU Brno, Pala, Ševecek, 1997)
  • printed dictionaries
  • Slovesa pro praxi (SPP, 1997), valency
    specification of 767 most frequent
  • verbs
  • Slovník spisovného jazyka ceského (SSJC, 1964)
  • Slovník spisovné ceštiny pro školu a verejnost
    (SSC, 1978)
  • Slovník ceských synonym (SCS, 1994)
  • Slovník ceské frazeologie a idiomatiky (SCFI,
    1983)
  • Czech National Corpus (CNK)
  • EuroWordNet, Czech WordNet

CIL XVII, Prague, July 26, 2003 22
23
References I.
  • Dorr, B.J. (2001) LCS Verb Database, Online
    Software Database of Conceptual Structures and
    Documentations, UCMP .
  • Fillmore, Ch. (2002) FrameNet and the Linking
    between Semantic and Syntactic Relations.
  • In COLING 2002, Proceedings, pp. xxviii-xxxvi.
  • Hajicová, E. et al. (2000) A Manual for
    Tectogrammatical Tagging of the Prague Dependency
    Treebank. UFAL/CKL Technical Report TR-2000-09.
  • Hajicová, E., Kucerová, I. (2002)
    Argument/Valency Structure in PropBank, LCS
    Database and Prague Dependency Treebank. In LREC
    2002, Proceedings, pp. 846-851.
  • Levin, B. (1993) English Verb Classes and
    Alternations A Preliminary Investigation.
    Chicago University of Chicago.
  • Lopatková, M. et al. (2002) Tektogramaticky
    anotovaný valencní slovník ceských sloves.
    UFAL/CKL Technical Report TR-2002-15.
  • Lopatková, M., Žabokrtský, Z. (2002) Valency
    Dictionary of Czech Verbs. In LREC 2002,
    Proceedings, pp. 949-956.
  • Lopatková, M. (2003) Valency in the Prague
    Dependency Treebank Building the Valency
    Lexicon. PBML 79. (in press)

CIL XVII, Prague, July 26, 2003 23
24
References II.
  • Pala, K., Ševecek, P. (1997) Valence ceských
    sloves. In Sborník prací FFUB, Brno.
  • Palmer, M. et al. (2001) Automatic Predicate
    Argument Analysis of the Penn TreeBank. In HLT
    2001, Proceedings, San Francisco Morgan Kaufamm.
  • Panevová, J. (1974-75) On Verbal Frames in
    Functional Generative Description. Part I, PBML
    22, pp. 3-40, Part II, PBML 23, pp. 17-52.
  • Panevová, J. (1994) Valency Frames and the
    Meaning of the Sentence. In Luelsdorff (ed.) The
    Prague School of Structural and Functional
    Linguistics, John Benjamins, pp. 223-243.
  • Panevová, J. (1998) Ješte k teorii valence. Slovo
    a slovesnost 59, pp. 1-14.
  • Panevová, J. (2000) Poznámky k valenci
    podstatných jmen. Ceština - univerzália a
    specifika 2, Masarykova Univerzita, Brno, pp.
    173-180.
  • Panevová, J. (2003) Some Issues of Syntax and
    Semantics of Verbal Modifications. In
    Proceedings of MTT 2003, Paris. (in press)
  • Sgall, P. et al. (1986) The Meaning of the
    Sentence in Its Semantic and Pragmatic Aspects.
    Dordrecht Reidel, Prague Academia.

CIL XVII, Prague, July 26, 2003 24
Write a Comment
User Comments (0)
About PowerShow.com