Post PAROLESIMPLE lexical resources and initiatives in Sweden - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Post PAROLESIMPLE lexical resources and initiatives in Sweden

Description:

brudkl nning (bride dress) dress bride. brudkrona (bride crown) crown bride ... The lexemes sharing mother and father relations are closer related to each other, ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 23
Provided by: Office20060
Category:

less

Transcript and Presenter's Notes

Title: Post PAROLESIMPLE lexical resources and initiatives in Sweden


1
Post PAROLE/SIMPLE lexical resources and
initiatives in Sweden
  • Maria Toporowska Gronostaj
  • Maria.Gronostaj_at_svenska.gu.se

New Horizons for Linguistic Resources in a Global
Context 7-8th July 2009, Barcelona
2
The main aims of my talk
  • present a work in progress on building a free
    full-scale lexical resource, the Swedish
    FrameNet (SFN), being conducted by the
    Swedish Language Bank
  • give an overview of lexical resources
    contributing to the development of SFN
  • describe a core of SFN, the SALDO lexicon
  • reflect on merging lexical data from different
    resources to acquire information on frames

3
The overall objectives of the SFN
  • create a robust lexical resource aimed at LT
    applications with
  • exhaustive morphological, syntactic and semantic
    description of lexical units, incl. information
    on frames and world knowledge relevant for
    word/text understanding
  • produce it cost-effectively by merging data from
    free lexical resources and re-using free software
    tools
  • ensure its content interoperability
  • create an interactive text-lexicon block with
    morphological and semantic annotations on the fly

4
Content interoperability a challenge for SFN
  • The contributing lexicons are heterogeneous in
    several respects
  • have partly different types of content
  • were developed for different purposes
  • were or are developed by different groups of
    contributors
  • language experts
  • a collective effort of both language engineers
    and users of web-lexicons

5
Free lexical resources behind SFN (1)
  • SALDO Swedish monolingual lexicon with semantic
    and morphological layers
  • 76,750 entries 74,000 distinct semantic units
  • The Swedish Associative Thesaurus by
  • L. Lönngren (1992) reincarnated by L. Borin
  • enhanced with a complete morphological
    description by L. Borin M. Forsberg
  • People's Synonym Dictionary (web-lexicon)
  • 80,000 Swedish synonym pairs
  • synonymy graded from 5 to 0 by lexicon users
  • collective effort of web-lexicon users
  • language engineering Viggo Kann

6
Free lexical resources behind SFN (2)
  • The People's Dictionary (Swedish/English)
  • collective effort of web-lexicon users
  • equivalents are graded by lexicon users
  • language engineering Viggo Kann
  • SemNet
  • 52,800 hyperonymy/hyponymy relations
    automatically retrieved from the definitions of
    nouns and verbs in GLDB
  • Parole/Simple lexicons
  • 29,000 syntactic units (valency) and 8,500
    semantic units encoded with mandatory information

7
(No Transcript)
8
SALDO unusual semantic network
  • Lexemes, arranged in a hierarchical network
    according to the principle of centrality,
    capturing semantic closeness between two lexemes
  • Semantic relations are postulated for both open
    and closed classes and can go beyond a word class
  • There are 51 primitive semantically unrelated
    concepts being the top nodes of the hierarchies
    capturing the centrality. These nodes are
    connected to an artificial top node PRIM to form
    a tree
  • There are no synsets in the sense of Wordnet.
    Neither glosses of the lexemes, nor semantic
    relations, such as hyponymy, hyperonymy or qualia
    relations are explicitly specified there.

9
Semantic centrality in SALDO
  • Each lexical unit is given
  • an obligatory main descriptor, mother, which can
    be complemented by an optional determinative
    descriptor, father
  • bröd (bread) mat mjöl (foodflour)
  • brud (bride) gifta sig hon (get marriedshe)
  • bröllop (wedding) gifta sig (get married)
  • gifta sig (get married) par (pair)

10
Semantic relations in SALDO
  • Mother descriptor is usually
  • semantically more close to the key word,
  • semantically and/or morphologically less complex
    than the key word
  • more frequent
  • stylistically more unmarked
  • acquired earlier in the first and second language
    acquisition
  • Father descriptors are used mainly to
    differentiate lexemes having the same mother.
  • They are assigned to ca 50 of words

11
Associative sets, assets
  • Keywords can function as mother- or father
    descriptors for other lexemes and thus form the
    basis of any number of derived relations,
    referred to as assets
  • brud (bride) get married she
  • kronbrud (crown bride) bride chastity
  • brudbukett (bride bouquet) bouquet bride
  • brudklänning (bride dress) dress bride
  • brudkrona (bride crown) crown bride
  • brudgum (bridegroom) get maried he
  • no assets

12
Assets sharing mother relations build natural
semantic groupings
  • sol (sun) lysa himmel (shine sky)
  • comet, moon, star (shine sky)
  • blinka (blink) lysa snabbt (shine quickly)
  • ljus (candle) lysa brinna (shine burn)
  • The lexemes sharing mother and father relations
    are closer related to each other, as compared to
    those having different father descriptors

13
SALDO world knowledge (1)
  • SALDO an intrinsic network capturing the world
    knowledge underlying lexical-semantic relations
  • The network relations are based on the notion of
    centrality by the depth of an entry, its
    distance down from the PRIM root node
  • The deeper an entry lies in the tree, the less
    central it is
  • PRIM
  • one
  • unit
  • two
  • pair
  • get married
  • bride
  • The average depth of entries in SALDO is 5, 7

14
SALDO world knowledge (2)
  • SALDO is supportive in recognizing entailments by
    pointing out the mother to a key word, which
    promotes word text understanding
  • It provides explicit information on distribution
    of the associative sets among lexemes (e.g. bride
    bridegroom)
  • It includes named entities as entries
  • Bulgakov författare rysk (writer Russian)

15
Approaches towards frames acquisition in SFN
  • Merging relevant lexical data from available free
    lexical resources
  • Cross-language transfer of lexical units with
    information on the frames and frame elements from
    FN to SFN
  • Automatic acquisition of frames from corpora
    using a software tool, FrameNet Labeler system
    for Swedish text

16
Merging lexical data with SALDO involves
  • interlinking the morphological units from the
    component lexicons (based on lemmas form, part
    of speech and inflectional patterns, whenever
    possible)
  • augmenting the SALDOs lexical units with the
    semantic content from SemNet, SIMPLE, Peoples
    Synonym Dictionary and English equivalents from
    the Peoples Dictionary (Swedish/English)
  • adding syntactic information from the PAROLE
    lexicon to SALDO

17
Frame acquistion supported by PAROLE/SIMPLE
  • V gifta sig (to marry/get married)
  • PAROLE
  • Sub. (Anim.) V (refl.) PrepObj (Anim.) med
    (with)
  • Sub. (Plural) (Anim.) V (refl.)
  • SIMPLE
  • Semantic type V Cooperative activity
  • Selection restrictions Human V Human, Human V
  • HumanVCooperative activityHuman gt Partner(s)
  • In FN the Partner role is a core FE in the
    frames
  • Collaboration, Forming Relationship, Personal
    relations
  • Due to the semantic syntactic data in the P/S
    lexicon, the frame Forming Relationship is
    selected for the verb marry

18
Automatic acquistion of frames and FEs
  • a software tool FrameNet Labeler for Swedish
    text
  • elaborated by R. Johansson, P. Nugues
  • trained on semantically annotated corpus,
    produced by a cross-language transfer
  • 75 accuracy in classification of FEs

19
Populating the frames in SFNwith lexical units
  • re-using the lexical data retrieved from corpora
    by the FrameNet labeler
  • cross-language transfer of lexical units from FN
    to SFN
  • semantic mining and refining lexical data in the
    SIMPLE lexicon
  • enhancing the repository of lexical units with
    synonyms, hyponyms and siblings

20
Conclusions (1)
  • Lexicons can be re-purposed and re-used for the
    task of SFN creation
  • Content integration and interoperability seems to
    be feasible to achieve
  • SFN can be augmented with
  • synsets to compensate for the lack of glosses,
    (data from Peoples Synonym Dictionary)
  • hyperonymy/hyponymy relationer from SemNet
  • world knowledge from the SALDO lexicon
  • Creation of a text-lexicon block with SALDO
    annotations on the fly is in progress

21
Conclusions (2)
  • Desirable further extensions of SFN
  • valency information
  • explicit semantic typing of lexical units
  • multi-word expressions
  • broader coverage of different domains
  • creation of text-lexicon block with semantic role
    annotations
  • SFN will make a Swedish contribution to
    BLARK/CLARIN available under Creative Commons
    Attribute-Share Alike Licence and LGPL 3.0

22
References
  • Borin L., Forsberg M. 2009. All in the Family A
    comparison of SALDO and WordNet. Proceedings of
    the 17th Nordic Conference of Computational
    Linguistics NODALIDA 2009. Odense.
  • Johansson, R. Nugues, P. 2007. Construction of a
    FrameNet labeler for Swedish Text. NODALIDA 2007.
    Helsinki.
  • Kann, V. , Rosell, M. 2005. Free Construction of
    a Swedish Dictionary of Synonyms. NODALIDA 2005.
    Joensuu.
  • Lönngren, L. 1989. Svensk associationslexikon.
    Del /-IV Institutionen för lingvistik. Uppsala
    universitet. Rapport UCDL-R-89-1.
  • Lönngren, L. 1998. A Swedish associative
    thesaurus. In Euralex 98 proceedings, Vol.2. pp
    467-474.
  • SALDO http//spraakbanken.gu.se/sal/
Write a Comment
User Comments (0)
About PowerShow.com