Title: Lecture 5: Lexical Relations
1Lecture 5 Lexical Relations WordNet
SIMS 202 Information Organization and Retrieval
- Prof. Ray Larson Prof. Marc Davis
- UC Berkeley SIMS
- Tuesday and Thursday 1030 am - 1200 pm
- Fall 2003
- http//www.sims.berkeley.edu/academics/courses/is2
02/f03/
2Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
3Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
4Definition of AI
- ... artificial intelligence AI is the science
of making machines do things that would require
intelligence if done by humans (Minsky, 1963)
5The Goals of AI Are Not New
- Ancient Greece
- Daedalus automata
- Judaisms myth of the Golem
- 18th century automata
- Singing, dancing, playing chess?
- Mechanical metaphors for mind
- Clock
- Telegraph/telephone network
- Computer
6Some Areas of AI
- Knowledge representation
- Programming languages
- Natural language understanding
- Speech understanding
- Vision
- Robotics
- Planning
- Machine learning
- Expert systems
- Qualitative simulation
7AI or IA?
- Artificial Intelligence (AI)
- Make machines as smart as (or smarter than)
people - Intelligence Amplification (IA)
- Use machines to make people smarter
8Furnas The Vocabulary Problem
- People use different words to describe the same
things - If one person assigns the name of an item, other
untutored people will fail to access it on 80 to
90 percent of their attempts. - Simply stated, the data tell us there is no one
good access term for most objects.
9The Vocabulary Problem
- How is it that we come to understand each other?
- Shared context
- Dialogue
- How can machines come to understand what we say?
- Shared context?
- Dialogue?
10Vocabulary Problem Solutions?
- Furnas et al.
- Make the user memorize precise system meanings
- Have the user and system interact to identify the
precise referent - Provide infinite aliases to objects
- Minsky and Lenat
- Give the system commonsense so it can
understand what the users words can mean
11CYC
- Decades long effort to build a commonsense
knowledge-base - Storied past
- 100,000 basic concepts
- 1,000,000 assertions about the world
- The validity of Cycs assertions are
context-dependent (default reasoning)
12Cyc Examples
- Cyc can find the match between a user's query for
"pictures of strong, adventurous people" and an
image whose caption reads simply "a man climbing
a cliff" - Cyc can notice if an annual salary and an hourly
salary are inadvertently being added together in
a spreadsheet - Cyc can combine information from multiple
databases to guess which physicians in practice
together had been classmates in medical school - When someone searches for "Bolivia" on the Web,
Cyc knows not to offer a follow-up question like
"Where can I get free Bolivia online?"
13Cyc Applications
- Applications currently available or in
development - Integration of Heterogeneous Databases
- Knowledge-Enhanced Retrieval of Captioned
Information - Guided Integration of Structured Terminology
(GIST) - Distributed AI
- WWW Information Retrieval
- Potential applications
- Online brokering of goods and services
- "Smart" interfaces
- Intelligent character simulation for games
- Enhanced virtual reality
- Improved machine translation
- Improved speech recognition
- Sophisticated user modeling
- Semantic data mining
14Cycs Top-Level Ontology
- Fundamentals
- Top Level
- Time and Dates
- Types of Predicates
- Spatial Relations
- Quantities
- Mathematics
- Contexts
- Groups
- "Doing"
- Transformations
- Changes Of State
- Transfer Of Possession
- Movement
- Parts of Objects
- Composition of Substances
- Agents
- Organizations
- Actors
- Roles
- Professions
- Emotion
- Propositional Attitudes
- Social
- Biology
- Chemistry
- Physiology
- General Medicine
- Materials
- Waves
- Devices
- Construction
- Financial
- Food
- Clothing
- Weather
- Geography
- Transportation
- Information
- Perception
- Agreements
- Linguistic Terms
- Documentation
http//www.cyc.com/cyc-2-1/toc.html
15Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
16Syntax
- The syntax of a language is to be understood as a
set of rules which accounts for the distribution
of word forms throughout the sentences of a
language - These rules codify permissible combinations of
classes of word forms
17Semantics
- Semantics is the study of linguistic meaning
- Two standard approaches to lexical semantics
(cf., sentential semantics and, logical
semantics) - (1) compositional
- (2) relational
18Lexical Semantics Compositional Approach
- Compositional lexical semantics, introduced by
Katz Fodor (1963), analyzes the meaning of a
word in much the same way a sentence is analyzed
into semantic components. The semantic components
of a word are not themselves considered to be
words, but are abstract elements (semantic atoms)
postulated in order to describe word meanings
(semantic molecules) and to explain the semantic
relations between words. For example, the
representation of bachelor might be ANIMATE and
HUMAN and MALE and ADULT and NEVER MARRIED. The
representation of man might be ANIMATE and HUMAN
and MALE and ADULT because all the semantic
components of man are included in the semantic
components of bachelor, it can be inferred that
bachelor ? man. In addition, there are
implicational rules between semantic components,
e.g. HUMAN ? ANIMATE, which also look very much
like meaning postulates. - George Miller, On Knowing a Word, 1999
19Lexical Semantics Relational Approach
- Relational lexical semantics was first introduced
by Carnap (1956) in the form of meaning
postulates, where each postulate stated a
semantic relation between words. A meaning
postulate might look something like dog ? animal
(if x is a dog then x is an animal) or, adding
logical constants, bachelor ? man and never
married if x is a bachelor then x is a man and
not(x has married) or tall ? not short if x is
tall then not(x is short). The meaning of a
word was given, roughly, by the set of all
meaning postulates in which it occurs. - George Miller, On Knowing a Word, 1999
20Pragmatics
- Deals with the relation between signs or
linguistic expressions and their users - Deixis (literally pointing out)
- E.g., Ill be back in an hour depends upon the
time of the utterance - Conversational implicature
- A Can you tell me the time?
- B Well, the milkman has come. I dont know
exactly, but perhaps you can deduce it from some
extra information I give you. - Presupposition
- Are you still such a bad driver?
- Speech acts
- Constatives vs. performatives
- E.g., I second the motion.
- Conversational structure
- E.g., turn-taking rules
21Language
- Language only hints at meaning
- Most meaning of text lies within our minds and
common understanding - How much is that doggy in the window?
- How much social system of barter and trade (not
the size of the dog) - doggy implies childlike, plaintive, probably
cannot do the purchasing on their own - in the window implies behind a store window,
not really inside a window, requires notion of
window shopping
22Semantics The Meaning of Symbols
- Semantics versus Syntax
- add(3,4)
- 3 4
- (different syntax, same meaning)
- Meaning versus Representation
- What a persons name is versus who they are
- A rose by any other name...
- What the computer program looks like versus
what it actually does
23Semantics
- Semantics assigning meanings to symbols and
expressions - Usually involves defining
- Objects
- Properties of objects
- Relations between objects
- More detailed versions include
- Events
- Time
- Places
- Measurements (quantities)
24The Role of Context
- The concept associated with the symbol 21 means
different things in different contexts - Examples?
- The question Is there any salt?
- Asked of a waiter at a restaurant
- Asked of an environmental scientist at work
25Whats in a Sentence?
- A sentence is not a verbal snapshot or movie
of an event. In framing an utterance, you have to
abstract away from everything you know, or can
picture, about a situation, and present a
schematic version which conveys the essentials.
In terms of grammatical marking, there is not
enough time in the speech situation for any
language to allow for the marking of everything
which could possibly be significant to the
message. - Dan Slobin, in Language Acquisition The state of
the art, 1982
26Lexical Relations
- Conceptual relations link concepts
- Goal of Artificial Intelligence
- Lexical relations link words
- Goal of Linguistics
27Major Lexical Relations
- Synonymy
- Polysemy
- Metonymy
- Hyponymy/Hypernymy
- Meronymy/Holonymy
- Antonymy
28Synonymy
- Different ways of expressing related concepts
- Examples
- cat, feline, Siamese cat
- Overlaps with basic and subordinate levels
- Synonyms are almost never truly substitutable
- Used in different contexts
- Have different implications
- This is a point of contention
29Polysemy
- Most words have more than one sense
- Homonym same sound and/or spelling, different
meaning (http//www.wikipedia.org/wiki/Homonym) - bank (river)
- bank (financial)
- Polysemy different senses of same word
(http//www.wikipedia.org/wiki/Polysemy) - That dog has floppy ears.
- She has a good ear for jazz.
- bank (financial) has several related senses
- the building, the institution, the notion of
where money is stored
30Metonymy
- Use one aspect of something to stand for the
whole - The building stands for the institution of the
bank. - Newscast The White House released new figures
today. - Waitperson The ham sandwich spilled his drink.
31Hyponymy/Hyperonymy
- ISA relation
- Related to Superordinate and Subordinate level
categories - hyponym(robin,bird)
- hyponym(emu,bird)
- hyponym(bird,animal)
- hyperym(animal,bird)
- A is a hypernym of B if B is a type of A
- A is a hyponym of B if A is a type of B
32Basic-Level Categories (Review)
- Brown 1958, 1965, Berlin et al., 1972, 1973
- Folk biology
- Unique beginner plant, animal
- Life form tree, bush, flower
- Generic name pine, oak, maple, elm
- Specific name Ponderosa pine, white pine
- Varietal name Western Ponderosa pine
- No overlap between levels
- Level 3 is basic
- Corresponds to genus
- Folk biological categories correspond accurately
to scientific biological categories only at the
basic level
33Psychologically Primary Levels
- SUPERORDINATE animal furniture
- BASIC LEVEL dog chair
- SUBORDINATE terrier rocker
- Children take longer to learn superordinate
- Superordinate not associated with mental images
or motor actions
34Meronymy/Holonymy
- Part/Whole relation
- meronym(beak,bird)
- meronym(bark,tree)
- holonym(tree,bark)
- Transitive conceptually but not lexically
- The knob is a part of the door.
- The door is a part of the house.
- ? The knob is a part of the house ?
- Holonyms are (approximately) the inverse of
meronyms
35Antonymy
- Lexical opposites
- antonym(large, small)
- antonym(big, small)
- antonym(big, little)
- but not large, little
- Many antonymous relations can be reliably
detected by looking for statistical correlations
in large text collections. (Justeson Katz 91)
36Thesauri and Lexical Relations
- Polysemy same word, different senses of meaning
- Slightly different concepts expressed similarly
- Synonyms different words, related senses of
meanings - Different ways to express similar concepts
- Thesauri help draw all these together
- Thesauri also commonly define a set of relations
between terms that is similar to lexical
relations - BT, NT, RT
- More on Thesauri next week
37What is an Ontology?
- From Merriam-Websters Collegiate
- A branch of metaphysics concerned with the nature
and relations of being - A particular theory about the nature of being or
the kinds of existence - More prosaically
- A carving up of the worlds meanings
- Determine what things exist, but not how they
inter-relate - Related terms
- Taxonomy, dictionary, category structure
- Commonly used now in CS literature to describe
structures that function as Thesauri
38Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
39WordNet
- Started in 1985 by George Miller, students, and
colleagues at the Cognitive Science Laboratory,
Princeton University - Miller also known as the author of the paper The
Magical Number Seven, Plus or Minus Two Some
Limits on our Capacity for Processing
Information (1956) - Can be downloaded for free
- www.cogsci.princeton.edu/wn/
40Miller on WordNet
- In terms of coverage, WordNets goals differ
little from those of a good standard
college-level dictionary, and the semantics of
WordNet is based on the notion of word sense that
lexicographers have traditionally used in writing
dictionaries. It is in the organization of that
information that WordNet aspires to innovation. - (Miller, 1998, Chapter 1)
41Presuppositions of WordNet Project
- Separability hypothesis
- The lexical component of language can be
separated and studied in its own right - Patterning hypothesis
- People have knowledge of the systematic patterns
and relations between word meanings - Comprehensiveness hypothesis
- Computational linguistics programs need a store
of lexical knowledge that is as extensive as that
which people have
42WordNet Size
WordNet Uses Synsets sets of synonymous terms
- POS Unique Synsets
- Strings
- Noun 107930 74488
-
- Verb 10806 12754
-
- Adjective 21365 18523
-
- Adverb 4583 3612
-
- Totals 144684 109377
-
43Structure of WordNet
44Structure of WordNet
45Structure of WordNet
46Unique Beginners
- Entity, something
- (anything having existence (living or nonliving))
- Psychological_feature
- (a feature of the mental life of a living
organism) - Abstraction
- (a general concept formed by extracting common
features from specific examples) - State
- (the way something is with respect to its main
attributes "the current state of knowledge"
"his state of health" "in a weak financial
state") - Event
- (something that happens at a given place and
time)
47Unique Beginners
- Act, human_action, human_activity
- (something that people do or cause to happen)
- Group, grouping
- (any number of entities (members) considered as a
unit) - Possession
- (anything owned or possessed)
- Phenomenon
- (any state or process known through the senses
rather than by intuition or reasoning)
48Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
49WordNet Demo
- Available online (from Unix) if you wish to try
it - Login to irony and type wn word for any word
you are interested in - Demo
50Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
51Discussion Questions
- Joe Hall on Lexical Relations and WordNet
- Which method of linguistic analysis do you think
will be more fruitful... the painstaking process
involved with building WordNet or the relatively
easy output afforded by Church et al.'s
computational method that, however, requires much
work to decipher the results?
52Discussion Questions
- Joe Hall on Lexical Relations and WordNet
- What are the problems/advantages of using the
World Wide Web itself as a "corpus"? (If you were
to incorporate the current digital copies of all
newspapers, journals, etc. wouldn't you very
quickly exceed the 15 Million words of the
largest corpus in the Church article?)
53Discussion Questions
- Joe Hall on Lexical Relations and WordNet
- With the diversity of dialects of the English
language, how much does this type of
computational analysis get confused by phrases
such as "What up?" (i.e., slang)? Aren't these
some of the more interesting parts of language
(i.e., how language evolves)?
54Lecture Overview
- Review
- Lexical Relations
- WordNet
- Demo
- Discussion Questions
- Action Items for Next Time
Credit for some of the slides in this lecture
goes to Marti Hearst and Warren Sack
55Homework
- Read Chapters 3 and 5 of The Organization of
Information (Textbook) - Discussion Question volunteers?
- Tu Tran
- Hong Qu
56Next Time