Title: Latin WordNet project
1Latin WordNet project
- Stefano Minozzi
- Laboratorio di Informatica Umanistica Università
degli Studi di Verona
2Latin WordNet project
- Laboratorio di Informatica Umanistica Università
degli Studi di Verona - http//www.cyllenius.net/labium/
- The Cognitive and Communication Technologies
(TCC) division Fondazione Bruno Kessler
Trento - http//cit.fbk.eu/en/research
3Historical creditsLatin WordNet project owes to
- Princeton WordNet lexical database for the
English language (was created and is being
maintained at the Cognitive Science Laboratory of
Princeton University under the direction of
psychology professor George A. Miller.
Development began in 1985.) - MultiWordNet a multilingual lexical database in
which the Italian WordNet is strictly aligned
with Princeton WordNet v. 1.6. (Developed since
1994, at Istituto Trentino di Cultura now
Fondazione Bruno Kessler)
4MultiWordnetmultilingual lexical matrix
language
meaning
lemma
5In Latin WordNet are represented
- Semantic part of speech
- Nouns
- Verbs
- Adjectives
- Adverbs
- Lexical relations that connect words
- Meanings are considered a constant through the
various languages, while the lexicalization of a
meaning is a language-specific variable
6Structure of the database
7the synset ( group of synonims) is the building
block of WordNet
v00682542 express an idea, etc. in words \"He
said that he wanted to marry her\" \"tell me
what is bothering you\" \"state your opinion\"
synset lemma
v00682542 adnuntio
v00682542 dico
v00682542 effor
v00682542 enuntio
v00682542 for
v00682542 inquam
v00682542 inseco
v00682542 loquor
v00682542 narro
synset word
v00682542 state
v00682542 say
v00682542 tell
synset word
v00682542 dire
v00682542 enunciare
v00682542 enunziare
v00682542 raccontare
8The synsets are linked with relations
9Ralations for adjectives and adverbs
10- Moreover the synsets are connected with semantic
field labels in order to create a domain-related
dictionaries
11Building the semantic network
12- Build a semantic network from scratch is very
time consuming - Resources available permits a different approach
- Automatic assignment of synsets
- Manual correction of the results
13Building blocks
- Latin to italian MRD (mostly from G. B. Conte
E. Pianezzola) - Latin to english MRD (mostly from OLD, via
William Whitaker's Words) - Italian and English branches of MultiWordnet
14We developed a number of assignment strategies
- Multilingual intersection method ? exploits
multilingual nature of MultiWordNet - Generic probability ? for very specialized words,
where polisemy is really limited - Gloss correspondence ? exploits glosses present
in the MRD - Intersection of synsets ? assigns a lemma to a
synset when a number of the translation
equivalents addresses to the same synset
15Intersection method
amor, is
love, affection the beloved Cupid affair
desire, passion sexual passion illicit passion
amore persona amata, amore questioni amorose,
amorazzi storie d'amoreamore,
desiderio Amoregli Amori, gli Amorini
Intersection
amor, is
n04478900
n05567241
n05607724
n05608483
n07109169
Synsets from italian
Synsets from english
16Generic probability
abactor, oris ? rustler, cattle_thief
one_who_drives_off
n07541894
SYNSET
17Gloss correspondence
punctum, i ? point, dot point, spot small_hole,
pin_prick sting, small_puncture (of_insect)
vote, tick tiny_amount full-stop, period
(punctuation)
PERIOD
n05126526
n09715092
n10843624
n10868422
n10869183
n10954173
n10961157
n10982844
n10988653
n05126526 Period point full_stop stop
full_point a punctuation mark (.) placed at the
end of a declarative sentence to indicate a full
stop or after abbreviations
18Intersezione di synset
punctum, i ? point, dot point, spot small_hole,
pin_prick sting, small_puncture (of_insect)
vote, tick tiny_amount full-stop, period
(punctuation)
POINT (24 synset) n02582551n03150523n03150944
n03151033n03719894n03720036n03958380n044
81751n04514257n04589546n04867079n04955967
n05110203n05126526n06351684n06745866n0978
0630n09869507n09933792n09962048n10018378n
10025218n10044643n10898122
DOT (2 synset) n05096549 n10025218
19Lexical Gaps
LEXICAL UNIT ? FREE COMBINATION
abactor, is ? gap latin-TO-italian ladro di
bestiame
20Consistency of the database
Latin Noun Verb Adj Adv TOTAL
SYNSETS 5621 2283 775 294 8973
LEMMAS 4777 2609 1259 479 9124
WORD SENSES 13060 10062 2054 732 25908
21- Latin WordNet can be browsed online
- http//multiwordnet.itc.it/english/home.php
- The database of Latin WordNet will soon be
available from European Language Resource
Association - http//www.elra.info/