Title: The Collaborative Open Dictionary Development
1The Collaborative Open Dictionary Development
- Thatsanee Ch.
- thatsanee_at_tcllab.org
2The Collaborative Open Dictionary Development
- TCL's Computational Lexicon
- Asian WordNet
- Asian Language Resources
3Classical Semantic Computational Lexicons
- Representing the meaning of a word (minimally)
requires - Distinguishing senses of word
- We walked along the bank of Chao Phra Ya river.
- He has an account at this bank.
- Indicating inferences
- Being human gt Being animate
4Computational Lexicon
- Explicit representation of word meaning
- Word content accessible to computational agents
- Word meaning linked to word syntax and morphology
- Multilingual lexical links
- Language resources for NLP systems
- syntactic subcategorization frames for parsing
- semantic selectional preferences for ambiguity
reduction - semantic classes for WSD, semantic tagging, etc.
5Computational Lexicons
- Terminological based
- EDR,
- Network based
- Wordnet, CyC
- Framenet
- Constraint based
- UW
- TCLLEX
6EDR EJ Dictionary
Terminological based
ltRecord Numbergt EJB1054678 ltHeadword
Informationgt ltHeadwordgt
belabor ltGrammar Informationgt ltPart of
Speechgt Verb ltSemantic Informationgt
ltConcept Identifiergt 3cecd7 ltHeadconceptgt
ltEnglish Headconceptgt blister
ltJapanese Headconceptgt ?????????
ltConcept Explicationgt ltEnglish Concept
Explicationgt to attack with sharp words
ltJapanese Concept Explicationgt
??????? ltCorrespondence Informationgt
ltCorrespondence Word Informationgt
ltCorrespondence Word Categorygt 0
ltCorrespondence Word Notationgt (???)????????
ltCorrespondence Word Categorygt 0
ltCorrespondence Word Notationgt (????)?????
ltCorrespondence Word Categorygt 0
ltCorrespondence Word Notationgt ???? ltManagement
Informationgt ltManagement History Recordgt
DATE"95/3/10"
7Wordnet (Princeton)
Network based Terminological based
- A large lexical-semantic resource, organised as a
semantic network. - To create a lexical thesaurus (not a dictionary)
which models the lexical organization used by
human. - Words are arranged in clusters of synsets to help
identify the meaning and differentiate it from
other meanings. - The overall organizing principle of wordnet is in
terms of semantic relations. - Where no synonyms are available to distinguish
concepts, glosses are used.
8WordNet
- About 150,000 lexical items
- http//wordnet.princeton.edu
9Semantic relations in Wordnet
Network based Terminological based
- Synonymy (dog, canine)
- Antonymy (rich, poor)
- Hyponomy (maple, tree) ISA relation
- Meronymy (body, limb) HASA relation (part of)
- Entailments (snore, sleep) for verbs
10Wordnet
Network based Terminological based
11CYC (Cycorp, Inc.)
Network based
- Cyc KB (over 120,000 concepts a million
assertions) - - Ontology - English lexicon
- Concept is defined as a constant, which can
represent a collection (e.g. the set of all
people) - An individual object (e.g. a particular person)
- A word (e.g. the English word)
- A relation (e.g. a predicate, function, slot,
attribute) - The entry for the predicate mother
- mother
- (mother ANIM FEM)
- isa FamilyRelationSlot BinaryPredicate
- the predicate mother takes 2 arguments,
- 1st must be an element of the collection
Animal, - 2nd must be an element of the collection
FemaleAnimal
12FrameNet (Berkeley)
Frame based
- To create a computational lexicon which describes
the semantic frames and valencies of verbs,
nouns, and adjectives.
13Thematic Roles (Fillmore 1968)
- TR describe the conceptual participants in a
situation in a generic way, independent from
their grammatical realization. - Agent, Patient, Object, Recipient, Instrument,
Source, Goal, Beneficient, Experiencer,
14Thematic Roles example annotated
- The window broke.
- A rock broke the window.
- John broke the window with a rock.
- The window pat broke
- A rock inst broke the window pat
- John ag broke the window pat with a rock
inst
15The Berkeley FrameNet (1996)
- Frame an inventory of conceptual structures
modelling a prototypical situation like
COMMERCIAL_TRANSACTION, COMMUNICATION_REQUEST,
SELF_MOTION - Semantic roles are locally valid only in Frame
Elements (FE)
16The Berkeley FrameNet (1996)
- FEs of the COMMUNICATION_REQUEST frame
- SPEAKER, ADDRESSEE, MESSAGE,
- FEs of the COMMERCIAL_TRANSACTION frame
- BUYER, SELLER, GOODS, PRICE,
17The Berkeley FrameNet (1996)
- A set of target words associated with each
frame for COMMERCIAL_TRANSACTION - Buy, sell, pay, spend, cost, charge
- Price, change, debt, credit, merchant, broker,
shop - Tip, fee,
18An example
- Airbus sells five A380 superjumbo planes to China
Southern for 220 million Euro. - China Southern buys five A380 superjumbo planes
from Airbus for 220 million Euro. - Airbus arranged with China Southern for the sales
of five A380 superjumbo planes at a price of 220
million Euro. - Five A380 superjumbo planes will go for 220
million Euro to China Southern.
(seller, buyer, goods, price)
19COMMERCIAL_TRANSACTION
- SELLER Airbus
- BUYER China Southern
- GOODS five A380 superjumbo planes
- PRICE 220 million Euro
20The Berkeley FrameNet (1996)
- Current release 700 frames (8,000 lexical units)
- http//framenet.icsi.berkeley.edu/
21FrameNet
Frame based
22UNL Knowledge base UW
Constraint based
23(TCLs Computational Lexicon) TCLLEX
- Design the frame-based lexicon representation
- Create the Ontology and Terminology
- Propose a computational framework
- Reuse the existing conceptual hierarchy
(thesaurus) and lexicon
24TCLs Computational Lexicon
Constraint based
- Design...
- Representativity
- Logical and Semantic constraints
- Expressiveness
- Thoroughness and incrementality
- Computationality (Operations)
- Similarity...Differentiation
- Relativity
- Inheritance
- Unification
25Logical Constraints
Representativity
- Vertical relation
- Is-a 189 classes -gt
- The logical constraints can be attached to a word
of any category type. They illustrate the logical
relationship among word senses in the lexicon.
26Semantic Constraints
- The semantic constraints are attached to a verb
or an adjective. They represent the relationship
among thematic roles in a verb or adjective
pattern. - Horizontal relation 16 relations
27Semantic Constraints
28Representation
- ???
- Morphological
- Syntactic
- Category V
- Subcategory VACT
- V Pattern SUBVOBJ
- Semantic
- Logical Constraint
- Is-a drive
- Semantic Constraint
- Agent Individual
- Complement Vehicle
- English drive
- ???
- Morphological
- Syntactic
- Category V
- Subcategory VACT
- V Pattern SUBVOBJ
- Semantic
- Logical Constraint
- Is-a displace
- Semantic Constraint
- Agent Person
- Object Person
- English expel
29Representation
- ???
- Morphological
- Syntactic
- Category V
- Subcategory VACT
- V Pattern SUBVOBJ
- Semantic
- Logical Constraint
- Is-a change
- Semantic Constraint
- Agent Organic structure
- Object Material
- English eliminate
30Representation
- ????????????
- Morphological
- Syntactic
- Category N
- Subcategory NCMN
- Semantic
- Logical Constraint
- Is-a Career
- English tailor
- ??
- Morphological
- Syntactic
- Category N
- Subcategory NCMN
- Semantic
- Logical Constraint
- Is-a Vehicle
- English car
31Representation
- ???
- Morphological
- Syntactic
- Category N
- Subcategory NCMN
- Semantic
- Logical Constraint
- Is-a Container
- English bowl
- ???
- Morphological
- Syntactic
- Category V
- Subcategory VACT
- V Pattern SUBV
- Semantic
- Logical Constraint
- Is-a utter
- Semantic Constraint
- Agent Fowl
- English crow
32Representation
- ????????????? ??? 1 Object
Container - ???????? ??? 2 Object
Implement - ????????????????????
- ??????????????????????????
- ???????????????? ??? 3 Object
Collector - ??????????????????? ??? 4 Object Lane
- ?????? ??? 5 Object Body
part - ?????? ??? 5 Object Body
part - ??????????????????????????????
33Expressiveness (content words)
- Express all part of speech (content words)
- Distinguish senses by the framework
- ???
- Morphological
- Syntactic
- Category N
- Subcategory NCMN
- Semantic
- Logical Constraint
- Is-a Container
- English bowl
- ???
- Morphological
- Syntactic
- Category V
- Subcategory VACT
- V Pattern SUBV
- Semantic
- Logical Constraint
- Is-a utter
- Semantic Constraint
- Agent Fowl
- English crow
34Computationality
object
??? ?? vehicle
??????? material
object
?????? ?? vehicle
35Computationality
object
object
??? ?????? plant
??? ?????? plant
??? body part
result
result
????? ??? lane
??? ????? clothing
??? lane
?? body part
object
??? ???????? monetary
???? price
36TCLLEX Statistics
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Asian WordNet
- English WordNet
- Asian Language Terminology
47Asian WordNet Construction
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54http//www.tcllab.org/tcllex